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CAROTENOID BIOSYNTHESIS 
FIELD OF THE INVENTION 

5 This invention relates to materials and methods for making carotenoids. 

BACKGROUND 

Carotenoids have significant utility in pigment and anti-oxidant applications. For 

1 0 example, many of the red, yellow, and orange colors observed in nature are pigments 
provided by one or more carotenoids. Carotenoids are among the best antioxidants 
provided by nature — orders of magnitude better than other naturally available materials 
such as vitamin C or vitamin E. The carotenoid molecule comprises multiples of the 
isoprene molecule, a C5 hydrocarbon with two double bonds. In view of the dual 

1 5 unsaturation of the isoprene molecule, the class of carotenoid molecules is characterized 
by long organic chains with conjugated double bonds. It has been shown that the high 
antioxidant capacity and the vivid pigmentation are directly attributable to the long chains 
of conjugated double bonds. For example, Conn et al. J. Photochemistry Photobiology B, 
11: 41-47, 1991 compared the common P-carotene — a C40 carotenoid having 11 

20 conjugated double bonds - with a chemically synthesized C50 P-carotene having 1 5 
conjugated double bonds and with a chemically synthesized C60 p-carotene having 19 
conjugated double bonds. The Conn et al. study concluded, based on quenching of 
singlet oxygen, that the efficiency of antioxidant activity increased with increasing 
numbers of conjugated double bonds. 

25 The literature is replete with details concerning the biosynthesis of C40 

carotenoids, including details concerning the associated genes and the enzymes encoded 
by the genes. However, the biosynthesis and biochemical properties of C>40 carotenoids 
is poorly understood relative to the level of knowledge of C40 carotenoids. Ironically, 
C>40 carotenoids have the potential to be more effective antioxidants, to provide greater 

30 health benefits, and to generate novel improved colored pigments (i.e. pigments of longer 
wavelength absorbance maxima). 

There are numerous reports in the literature of bacteria that are capable of 
producing C50 carotenoids. Examples of such bacteria include Halobacterium 
salinarium, Cellidomonas biazotea, Arthrobacter glacialis, Corynebacterium poinsettiae, 
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Micrococcus luteus, and Agromyces mediolanus. Examples of C50 carotenoids produced 
by Micrococcus luteus, Agromyces mediolanus, and Halobacterium salinarium are shown 
in FIG 11. 

Three C50 carotenoids (molecular formulae C50H72O2) have been isolated from 
5 the psychrophilic bacterium Arthrobacter glacialis, including bicyclic decaprenoxanthin, 
aliphatic bisanhydrobacterioruberin, and monocyclic A.g. 470 (Arpin N, et al. Acta Chem 
Scand B 29:921-6, 1975). 

It is clear that carotenoid characteristics such as antioxidant and pigment 
capabilities improve with a greater number of conjugated double bonds. In view of 
1 0 production and other technical limitations, however, commercial use of carotenoids has 
been substantially limited to those no longer than C40. To allow sufficient production of 
the C50 carotenoid to commercially utilize its improved properties, it would be desirable 
to have the capability to convert C40 carotenoids to C50 carotenoids by genetic 
manipulation. 

15 

SUMMARY OF THE INVENTION 

The present invention is based on isolated nucleic acid molecules that encode 
polypeptides that allow C40 carotenoids to be converted to carotenoids having greater 
than 40 carbon atoms (O40), such as a C50 carotenoid. These polypeptides can be used 

20 in vitro or in vivo. The isolated nucleic acid molecules can be introduced into a 
production cell, wherein the production cell becomes capable of converting a C40 
carotenoid to a O40 carotenoid, such as a C50 carotenoid. 

In one aspect, the invention features an isolated polypeptide, isolated nucleic acid 
molecules encoding the polypeptide, and production cells that include the isolated nucleic 

25 acid molecules. The isolated polypeptide includes at least one amino acid sequence 
selected from the group consisting of (a) the amino acid sequence set forth in SEQ ID 
NOS: 04, 05, 06, 10, 1 1, 12, 17, 18, 19, 20, 24, 25 or 26; (b) an amino acid sequence 
having at least 10 contiguous amino acid residues of the amino acid sequence set forth in 
SEQ ID NOS: 04, 05, 06, 10, 11, 12, 17, 18, 19, 20, 24, 25 or 26; (c) an amino acid 

30 sequence having one or more conservative amino acid substitutions within the amino acid 
sequence set forth in SEQ ID NOS: 04, 05, 06, 10, 11, 12, 17, 18, 19,20, 24, 25 or 26; 
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and (d) an amino acid sequence having at least 65% sequence identity with the amino 
acid sequences of (a) or (b). Polypeptides at least 10 amino acid residues in length are 
useful for, among other things, generating specific binding agents, such as antibodies. 
Polypeptides having at least 65% sequence identity with the amino acid sequences of (a) 
5 or (b) are useful for creating specific binding agents that vary in binding strength, as well 
as for creating polypeptides with enzymatic activities that vary in binding strength (Km) 
and/or turnover rate (Kcat). 

The nucleic acid molecule can encode a polypeptide capable of converting a C40 
carotenoid to a C50 carotenoid, a C40 carotenoid to a C45 carotenoid, a C45 carotenoid 

10 to a C50 carotenoid, or capable of synthesizing a C40 carotenoid. These polypeptides can 
be used in vitro or in vivo. 

The invention also features an isolated nucleic acid molecule or a production cell 
containing the nucleic acid molecule. The nucleic acid molecule includes a nucleic acid 
sequence selected from the group consisting of: (a) the nucleotide sequence set forth in 

15 SEQ ED NOS: 01, 02, 03, 07, 08, 09, 13, 14, 15, 16, 21, 22 or 23; (b) a nucleic acid 

sequence having at least 10 contiguous nucleotides of the nucleotide sequence set forth in 
SEQ ID NOS: 01, 02, 03, 07, 08, 09, 13, 14, 15, 16, 21, 22 or 23; (c) a nucleic acid 
sequence that hybridizes under moderately stringent conditions to the nucleotide sequence 
of (a); and (d) a nucleic acid sequence having 65% sequence identity with the nucleic 

20 acid sequence of (a) or (b). These nucleic acid molecules are useful for identifying other 
nucleic acid sequences that encode polypeptides with similar enzymatic activities to those 
described herein. Methods such as the polymerase chain reaction (PCR), which utilizes 
short fragments of the disclosed sequences, or Northern and/or Southern blotting 
procedures which utilize slightly longer fragments, can be used to identify substantially 

25 similar sequences. 

In another aspect, the invention features a method for making a C50 carotenoid. 
The method includes contacting at least one of the polypeptides described above with a 
C40 carotenoid such that the C50 carotenoid is made. A C50 carotenoid also can be 
made by culturing the production cell described above under conditions wherein the C50 

30 carotenoid is made. 
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In yet another aspect, the invention features a method for making a C45 
carotenoid. The method includes contacting at least one of the polypeptides described 
above with a C40 carotenoid such that the C45 carotenoid is made. A C45 carotenoid 
also can be made by culturing the production cell described above under conditions 
5 wherein the C45 carotenoid is made. 

The invention also features a method for making a polypeptide. The method 
includes culturing the production cell described above under conditions such that the 
polypeptide is made. 

In another aspect, the invention features a specific binding agent that binds to the 

1 0 polypeptide described above. 

In yet another aspect, the invention features a method for making a O40 
carotenoid. The method includes culturing a production cell, wherein the production cell 
includes an exogenous nucleic acid molecule, wherein the exogenous nucleic acid 
molecule encodes a polypeptide that elongates a C>40 carotenoid by at least one carbon 

1 5 atom, wherein the product produced by the polypeptide is a carotenoid having a carbon 
backbone of >40 carbon atoms. The use of the term carbon backbone refers to the single 
contiguous chain of carbon-carbon bonds that are found in carotenoids. The exogenous 
nucleic acid molecule can include a nucleic acid sequence selected from the group 
consisting of: (a) the nucleotide sequence set forth in SEQ ID NOS: 01, 02, 03, 07, 08, 

20 09, 13, 14, 15, 16, 21, 22 or 23; (b) a nucleotide sequence having at least 10 consecutive 
nucleotides of the nucleotide sequence set forth in SEQ ID NOS: 01, 02, 03, 07, 08, 09, 
13, 14, 15, 16, 21, 22 or 23; (c) a nucleic acid sequence that hybridizes under moderately 
stringent conditions to the nucleotide sequence of (a); and (d) a nucleic acid sequence 
having 65% sequence identity with the nucleic acid sequence of (a) or (b). The 

25 exogenous nucleic acid molecule can encode a polypeptide, wherein the polypeptide 
includes an amino acid sequence selected from the group consisting of: (a) the amino 
acid sequence of SEQ ID NOS: 04, 05, 06, 10, 11, 12, 17, 18, 19, 20, 24, 25 or 26; (b) an 
amino acid sequence having at least 10 contiguous amino acid residues of the amino acid 
sequence set forth in SEQ ID NOS: 04, 05, 06, 10, 1 1, 12, 17, 18, 19, 20, 24, 25 or 26; 

30 (c) an amino acid sequence having one or more conservative amino acid substitutions 
within the amino acid sequence of SEQ ID NOS: 04,05,06, 10, 11, 12, 17, 18, 19,20, 
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24, 25 or 26; and (d) an amino acid sequence having at least 65% sequence identity with 
the amino acid sequences of (a) or (b). 

These and other aspects of the invention will are discussed in more detail in the 
following detailed description. 



SEQUENCE LISTING 

The nucleic and amino acid sequences listed in the accompanying sequence listing 
are shown using standard letter abbreviations for nucleotide bases, and three-letter codes 
10 for amino acids. Only one strand of each nucleic acid sequence is shown, but the 
complementary strand is understood to be included by any reference to the displayed 
strand. 

SEQ ID NO: 01 is the nucleic acid sequence for the A. mediolanus IctA gene (a 
lycopene cyclase). 

15 SEQ ID NO: 02 is the nucleic acid sequence for the A. mediolanus IctB gene. 



5 



SEQ ID NO: 03 is the nucleic acid sequence for the A. mediolanus IctC gene. 



20 



25 



SEQ ED NO: 04 is the amino acid sequence encoded by SEQ ID NO: 01. 
SEQ ID NO: 05 is the amino acid sequence encoded by SEQ ID NO: 02. 
SEQ ID NO: 06 is the amino acid sequence encoded by SEQ ID NO: 03. 
SEQ ID NO: 07 is the nucleic acid sequence for the M. luteus IctA gene. 
SEQ ED NO: 08 is the nucleic acid sequence for the M. luteus IctB gene. 
SEQ ED NO: 09 is the nucleic acid sequence for the M. luteus IctC gene. 
SEQ ID NO: 10 is the arnino acid sequence encoded by SEQ ID NO: 07. 
SEQ ED NO: 1 1 is the amino acid sequence encoded by SEQ ED NO: 08. 
SEQ ED NO: 12 is the amino acid sequence encoded by SEQ ID NO: 09. 



SEQ ED NO: 13 is the nucleic acid sequence for the A. mediolanus idi gene. 
SEQ ED NO: 14 is the nucleic acid sequence for the A. mediolanus crtE gene. 
SEQ ID NO: 1 5 is the nucleic acid sequence for the A. mediolanus crtB gene. 
SEQ ED NO: 16 is the nucleic acid sequence for the A. mediolanus crtl gene. 



30 



SEQ ED NO: 17 is the amino acid sequence encoded by SEQ ID NO: 13. 
SEQ ED NO: 18 is the amino acid sequence encoded by SEQ ID NO: 14. 
SEQ ED NO: 19 is the amino acid sequence encoded by SEQ ID NO: 15. 
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SEQ ID NO: 20 is the amino acid sequence encoded by SEQ ID NO: 16. 

SEQ ID NO: 21 is the nucleic acid sequence for the M. luteus crtE gene. 

SEQ ID NO: 22 is the nucleic acid sequence for the M. luteus crtB gene. 

SEQ ID NO: 23 is the nucleic acid sequence for the M. luteus crt/gene. 
5 SEQ ID NO: 24 is the amino acid sequence encoded by SEQ ID NO: 21 . 

SEQ ID NO: 25 is the amino acid sequence encoded by SEQ ID NO: 22. 

SEQ ID NO: 26 is the amino acid sequence encoded by SEQ ID NO: 23. 

SEQ ID NOS: 27-30 are primers used to amplify regions of the carotenogenic 
operon from the Yl clone. 
1 0 SEQ ID NOS: 3 1 and 32 are primers used to amplify ORFY. 

SEQ ID NO: 33 is a primer used in combination with SEQ ID NO: 32, to amplify 
the region of A. mediolanus genomic DNA containing the XI, X2, and Y ORFs. 

SEQ ID NOS: 34 and 35 are primers used to amplify a mutated ORFX1, ORFX2, 
and ORFY fragment. 

1 5 SEQ ID NOS: 3 6 and 37 are primers used to amplify a mutated ORFX2 fragment. 

SEQ ID NOS: 38 and 39 are primers used to amplify a mutated ORFY fragment. 
SEQ ID NOS: 40 and 41 are primers used to make a probe to identify M. luteus 
homologs. 

SEQ ID NOS: 42-45 are primers used for M. luteus genomic walking. 

20 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG 1 is the nucleotide sequence of the 9-Kb Yl operon - the C50 carotenoid 
producing operon from A. mediolanus. 
25 FIG 2 contains FDPLC chromatograms of carotenoid extracts from A. mediolanus, 

E. coli transformed with the idi-Y construct, E. coli transformed with the idi-crtl 
construct, a lycopene standard, and E. coli transformed with the idi-X2 construct. 

FIG 3 A contains chromatograms of carotenoid extracts from A mediolanus and E. 
coli transformed with the Wi-ORFY construct (Yellow E. coli clone Y33). The two 
30 analyses show a peak at virtually the same retention time. 
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FIG 3B contains visible spectra for the A. mediolanus extract and an extract from 
E. coli transformed with the zdi-ORFY (Yellow E, coli clone Y33). The visible spectra 
for both peaks are virtually identical. 

FIG 4 is mass spectra of carotenoid extracts from A. mediolanus and from E. coli 
5 transformed with the icfi-ORFY construct (Yellow E. coli clone Y33). The analysis 
confirmed that the compound from clone Y33 and A. mediolanus at a retention time of 7 
rninutes had the same mass. 

FIG 5 contains HPLC chromatograms of carotenoids extracted from E. coli 
transformed with the idi-crtl construct and a lycopene standard (Sigma). 
10 FIG 6 contains visible spectra for carotenoids extracted from E. coli transformed 

with the idi-crtl construct and a lycopene standard (Sigma). The visible spectra are 
virtually identical. 

FIG 7 contains mass spectra of a lycopene standard, carotenoids produced in E. 
coli transformed with the idi-crtl construct and carotenoids produced in E. coli 
15 transformed with the ;'<#-ORFX2 construct. 

FIG 8 is a visible-spectrophotometric analysis of carotenoid extracts from A. 
mediolanus and mutant E. coli clones. The mutant E. coli clones produced the C40 
carotenoid lycopene and no C50 carotenoid, while A. mediolanus produced the C50 
carotenoid decaprenoxanthin. 
20 FIG 9 is a schematic of the arrangement of genes within the bio synthetic pathway 

for the production of a C50 carotenoid for A. mediolanus, M. luteus, C. glutamicum, H. 
salinarium, and M. thermoautotrophicum. 

FIG 10 is a schematic of the biosynthetic pathway for the production of 
decaprenoxanthin in A. mediolanus and the postulated role of the let A, IctB, and IctC 
25 genes. 

FIG 1 1 depicts examples of C50 carotenoid structures reported in the literature. 
FIG 12 is the nucleotide sequence of the C50-carotenoid producing operon from 
M /ateitf ATCC383. 
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DETAILED DESCRIPTION 

I. Terms 

Unless otherwise noted, technical terms are used according to conventional usage. 
5 Definitions of common terms in molecular biology may be found in Benjamin Lewin, 
Genes VII. Oxford University Press, 1999 (ISBN 0-19-879276-X); Kendrew et. al. 
( editors'). The Encyclopedia of Molecular Biology. Blackwell Science Ltd., 1994 (ISBN 
0-632-021 182-9); and Robert A. Meyers (editor), Molecular Biology and Biotechnology: 
a Comprehensive Desk Reference. BCH Publishers, Inc., 1995 (ISBN 1-56081-569-8). 
10 Carotenoid - A molecule that includes at least two isoprenoid units joined in 

such a manner that the two joined isoprenoid units have two methyl groups in a 
1,6-positional relationship. The term "carotenoid" also includes derivatives having one or 
more hydrogen atoms replaced with a substituent group or atom. Non-limiting examples 
of substituents include 1) hydroxyl groups (yielding an alcohol); 2) methoxyl groups 
15 (derived from an alcohol); 3) glycosyl (sugar) residues (attached by an ether bond); 4) 
fatty acid residues (attached by an ester bond); 5) carbonyl groups (yielding aldehydes or 
ketones); 6) sulfates; 7) carboxylic acids; and 8) epoxides. Additional carbon atoms can 
be added via the substituent group. Hydrogen atoms can be replaced anywhere on the 
molecule, including within the methyl groups in the 1-6 positional relationship. Non- 
20 limiting examples of typical carotenoids include |3-carotene, phytoene, lycopene, 
dehydrogenans P-452, decaprenoxanthin, 4 5 4'-diapophytoene, and norbixin. 

CX - The carotenoid molecules of the present application are characterized by the 
term "CX", wherein "C" refers to carbon atoms and the "X" refers to the total number of 
carbon atoms in the isoprenoid units of the carotenoid molecule. 
25 C>X - The designation "OX carotenoid" means a carotenoid having more than 

X carbon atoms total in the isoprenoid units of the carotenoid molecule. Similarly C<X is 
used to identify a carotenoid having less than X carbon atoms. 

Homology - A term referring to the sequence identity between two or more 
sequences. 

30 Isoprenoid - A molecule that is a multiple of the C5 hydrocarbon isoprene (2- 

methyl-1 ,2-butadiene). 
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Polypeptide - The term "polypeptide" includes any chain of amino acids at least 
eight amino acids in length, regardless of post-translational modification. 

Nucleic acid - The term "nucleic acid" as used herein encompasses both RNA and 
DNA including, without limitation, cDNA, genomic DNA, and synthetic (e.g., chemically 
5 synthesized) DNA. The nucleic acid can be double-stranded or single- stranded. Where 
single-stranded, the nucleic acid can be the sense strand or the antisense strand. In 
addition, nucleic acid can be circular or linear. 

Isolated - The term "isolated" as used herein with reference to a polypeptide 
refers to a polypeptide that has been separated from the cellular components that naturally 

10 accompany it. Typically, the polypeptide is isolated when it is at least 60% (e.g., 70%, 
80%, 90%, 92%, 95%, 98%, or 99%), by weight, free from proteins and naturally- 
occurring organic molecules that are naturally associated with it. In general, an isolated 
polypeptide will yield a single major band on a non-reducing polyacrylamide gel. 
The term "isolated" as used herein with reference to nucleic acid refers to a 

1 5 naturally-occurring nucleic acid that is not immediately contiguous with both of the 

sequences with which it is immediately contiguous (one on the 5' end and one on the 3' 
end) in the naturally-occurring genome of the organism from which it is derived. For 
example, an isolated nucleic acid can be, without limitation, a recombinant DNA 
molecule of any length, provided one of the nucleic acid sequences normally found 

20 immediately flanking that recombinant DNA molecule in a naturally-occurring genome is 
removed or absent. Thus, an isolated nucleic acid includes, without limitation, a 
recombinant DNA that exists as a separate molecule (e.g., a cDNA or a genomic DNA 
fragment produced by PCR or restriction endonuclease treatment) independent of other 
sequences as well as recombinant DNA that is incorporated into a vector, an 

25 autonomously replicating plasmid, a virus (e.g., a retrovirus, adenovirus, or herpes virus), 
or into the genomic DNA of a prokaryote or eukaryote. In addition, an isolated nucleic 
acid can include a recombinant DNA molecule that is part of a hybrid or fusion nucleic 
acid sequence. 

The term "isolated" as used herein with reference to nucleic acid also includes any 
30 non-naturally-occurring nucleic acid since non-naturally-occurring nucleic acid sequences 
are not found in nature and do not have immediately contiguous sequences in a naturally- 
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occurring genome. Fox example, non-na1uxaUy-occurring nucleic acid such as an 
engineered nucleic acid is considered to be isolated nucleic acid. Engineered nucleic acid 
can be made using common molecular cloning or chemical nucleic acid synthesis 
techniques. Isolated non-naturally-occurring nucleic acid can be independent of other 
5 sequences, or incorporated into a vector, an autonomously replicating plasmid, a virus 
(e.g., a retrovirus, adenovirus, or herpes virus), or the genomic DNA of a prokaryote or 
eukaryote. In addition, a non-naturally-occurring nucleic acid can include a nucleic acid 
molecule that is part of a hybrid or fusion nucleic acid sequence. 

It will be apparent to those of skill in the art that a nucleic acid existing among 

10 hundreds to millions of other nucleic acid molecules within, for example, cDNA or 

genomic libraries, or gel slices containing a genomic DNA restriction digest is not to be 
considered an isolated nucleic acid. 

Exogenous: The term "exogenous" as used herein with reference to nucleic acid 
and a particular cell refers to any nucleic acid that does not originate from that particular 

15 cell as found in nature. Thus, non-naturally-occurring nucleic acid is considered to be 
exogenous to a cell once introduced into the cell. Nucleic acid that is naturally-occurring 
also can be exogenous to a particular cell. For example, an entire chromosome isolated 
from a cell of person X is an exogenous nucleic acid with respect to a cell of person Y 
once that chromosome is introduced into Y's cell. 

20 ORF (open reading frame) - An "ORF" is a series of nucleotide triplets (codons) 

encoding a sequence of amino acids at least 100 amino acids in length without any 
termination codons. 

Probes and primers - Nucleic acid probes and primers may be prepared readily 
based on the amino acid sequences and nucleic acid sequences provided by this invention. 

25 A "probe" comprises an isolated nucleic acid attached to a detectable label or 

reporter molecule. Typical labels include radioactive isotopes, ligands, chemiluminescent 
agents, and polypeptides. Methods for labeling and guidance in the choice of labels 
appropriate for various purposes are discussed in, e.g., Sambrook et al. (ed.), Molecular 
Cloning: A Laboratory Manual 2 nd ed., vol. 1-3, Cold Spring Harbor Laboratory Press, 

30 Cold Spring Harbor, N.Y., 1989, and Ausubel et al. (ed.) Current Protocols in Molecular 
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Biology, Greene Publishing and Wiley-Interscience, New York (with periodic updates), 
1987. 

"Primers" are short nucleic acids, preferably DN A oligonucleotides, 10 
nucleotides or more in length. A primer may be annealed to a complementary target DNA 
5 strand by nucleic acid hybridization to form a hybrid between the primer and the target 
DNA strand, and then extended along the target DNA strand by a DNA polymerase. 
Primer pairs can be used for amplification of a nucleic acid sequence, e.g., by the 
polymerase chain reaction (PCR), or other nucleic-acid amplification methods known in 
the art. 

1 0 Methods for preparing and using probes and primers are described, for example, 

in references such as Samhrook et al. (ed.), Molecular Cloning: A Laboratory Manual. 
2nd ed., vol. 1-3, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989; 
Ausubel et al. (ed.), Current Protocols in Molecular Biology. Greene Publishing and 
Wiley-Interscience, New York (with periodic updates), 1987; and Innis et al, PCR 

15 Protocols: A Guide to Methods and Applications, Academic Press: San Diego, 1990. PCR 
primer pairs can be derived from a known sequence, for example, by using computer 
programs intended for that purpose such as Primer Designer 3 for Windows by Scientific 
& Educational Software (Durham, NC). 

One of skill in the art will appreciate that the specificity of a particular probe or 

20 primer generally increases with the length of the probe or primer. Thus, for example, a 
primer comprising 20 consecutive nucleotides will anneal to a target having a higher 
specificity than a corresponding primer of only 1 5 nucleotides. Thus, in order to obtain 
greater specificity, probes and primers may be selected that comprise, for example, 10, 
20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 

25 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 350, 400, 450, 500, 550, 600, 650, 
700 or more consecutive nucleotides. 

Recombinant - A "recombinant" nucleic acid is one having (1) a sequence that is 
not naturally occurring in the organism in which it is expressed or (2) a sequence made by 
an artificial combination of two otherwise-separated, shorter sequences. This artificial 

30 combination is often accomplished by chemical synthesis or, more commonly, by the 
artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering 
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techniques. "Recombinant" is also used to describe nucleic acid molecules that have been 
artificially manipulated, but contain the same regulatory sequences and coding regions 
that are found in the organism from which the nucleic acid was isolated. 

Sequence identity - The similarity between two or more nucleic acid sequences 
5 or amino acid sequences is referred to as "Sequence Identity." The "percent sequence 
identity" between a particular nucleic acid or amino acid sequence and a sequence 
referenced by a particular sequence identification number is determined as follows. 

First, a nucleic acid or amino acid sequence is compared to the sequence set forth 
in a particular sequence identification number using the BLAST 2 Sequences (B12seq) 

1 0 program from the stand-alone version of BLASTZ containing BLASTN version 2.0. 1 4 
and BLASTP version 2.0.14. This stand-alone version of BLASTZ can be obtained at 
www.fr.com or www.ncbi.nlm.nih.gov. Instructions explaining how to use the B12seq 
program can be found in the readme file accompanying BLASTZ. B12seq performs a 
comparison between two sequences using either the BLASTN or BLASTP algorithm. 

15 BLASTN is used to compare nucleic acid sequences, while BLASTP is used to compare 
amino acid sequences. To compare two nucleic acid sequences, the options are set as 
follows: -i is set to a file containing the first nucleic acid sequence to be compared (e.g., 
C:\seql .txt); -j is set to a file containing the second nucleic acid sequence to be compared 
(e.g., C:\seq2.txt); -p is set to blastn; -o is set to any desired file name (e.g., C:\outputtxt); 

20 -q is set to -1 ; -r is set to 2; and all other options are left at their default setting. For 
example, the following command can be used to generate an output file containing a 
comparison between two sequences: C:\B12seq -i c:\seql .txt -j c:\seq2.txt -p blastn -o 
c:\output.txt -q -1 -r 2. 

To compare two amino acid sequences, the options of B12seq are set as follows: -i 

25 is set to a file containing the first amino acid sequence to be compared (e.g., C:\seql .txt); 
-j is set to a file containing the second amino acid sequence to be compared (e.g., 
C:\seq2.txt); -p is set to blastp; -o is set to any desired file name (e.g., C:\output.txt); and 
all other options are left at their default setting. For example, the following command can 
be used to generate an output file containing a comparison between two amino acid 

30 sequences: C:\Bl2seq -i c:\seql .txt -j c:\seq2.txt -p blastp -o c:\outputtxt. 
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If the target sequence shares homology with any portion of the identified sequence 
(i.e., the sequence identified by a SEQ ID NO herein), then the designated output file will 
present those regions of homology as aligned sequences. If the target sequence does not 
share homology with any portion of the identified sequence, then the designated output 
5 file will not present aligned sequences. Once aligned, a length is determined by counting 
the number of consecutive nucleotides or amino acid residues from the target sequence 
presented in alignment with sequence from the identified sequence starting with any 
matched position and ending with any other matched position. A matched position is any 
position where an identical nucleotide or amino acid residue is presented in both the 

10 target and identified sequence. Gaps presented in the target sequence are not counted 
since gaps are not nucleotides or amino acid residues. Likewise, gaps presented in the 
identified sequence are not counted since target sequence nucleotides or amino acid 
residues are counted, not nucleotides or amino acid residues from the identified sequence. 
The percent identity over a determined length is determined by counting the 

1 5 number of matched positions over that length and dividing that number by the length 

followed by multiplying the resulting value by 100. For example, if (1) a 1000 nucleotide 
target sequence is compared to the sequence set forth in SEQ ID NO: 1, (2) the B12seq 
program presents 200 nucleotides from the target sequence aligned with a region of the 
sequence set forth in SEQ ID NO: 1 where the first and last nucleotides of that 200 

20 nucleotide region are matches, and (3) the number of matches over those 200 aligned 
nucleotides is 180, then the 1000 nucleotide target sequence contains a length of 200 and 
a percent identity over that length of 90 (i.e., 180 / 200 * 100 = 90). 

It will be appreciated that a single nucleic acid or amino acid target sequence that 
aligns with an identified sequence can have many different lengths with each length 

25 having its own percent identity. For example, a target sequence containing a 20- 

nucleotide region (SEQ ID NO: 46) that aligns with an identified sequence (SEQ ID NO: 
47) as follows has many different lengths including those listed in Table 1 . 
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1 20 
Target Sequence: AGGTCGTGTACTGTCAGTCA 

I II III I I I I I I I I I 
Identified Sequence: ACGTGGTGAACTGCCAGTGA 

5 

TABLE 1 



Starting Position 


Ending Position 


Length 


Matched Positions 


Percent Identity 


1 


20 


20 


15 


75.0 


1 


18 


18 


14 


77.8 


1 


15 


15 


11 


73.3 


6 


20 


15 


12 


80.0 


6 


17 


12 


10 


83.3 


6 


15 


10 


8 


80.0 


8 


20 


13 


10 


76.9 


8 


16 


9 


7 


77.8 



It is noted that the percent identity value is rounded to the nearest tenth. For 
example, 78.1 1, 78.12, 78.13, and 78.14 is rounded down to 78.1, while 78.15, 78.16, 
10 78 . 1 7, 78 . 1 8, and 78 . 1 9 is rounded up to 78 .2. It is also noted that the length value will 
always be an integer. 

Accordingly, the invention provides nucleic acid sequences and amino acid 
sequences that share at least 60, 65, 70, 75, 80, 85, 90, 95, 97, and 98% sequence identity 
to SEQ ID NOS: 01, 02, 03, 07, 08, 09, 13, 14, 15, 16, 21, 22, and 23, and SEQ ID NOS: 
1 5 04, 05, 06, 1 0, 1 1 , 12, 1 7, 1 8, 1 9, 20, 24, 25, and 26, respectively. 

Specific binding agent - A "specific binding agent" is an agent that is capable of 
specifically binding to the polypeptides of the present invention, and may include 
polyclonal antibodies, monoclonal antibodies (including humanized monoclonal 
antibodies) and fragments of monoclonal antibodies such as Fab, F(ab')2 and Fv 
20 fragments, as well as any other agent capable of specifically binding to the epitopes on 
the proteins. 

Antibodies to the polypeptides, and fragments thereof, of the present invention 
may be useful for purification of the polypeptides. The amino acid and nucleic acid 
sequences provided herein allow for the production of specific antibody-based binding 
25 agents to these polypeptides. 
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Monoclonal or polyclonal antibodies may be produced to full-length polypeptides, 
polypeptides that are less than full-length, or variants thereof. Optimally, antibodies 
raised against epitopes on these antigens will specifically detect the polypeptides. That is, 
antibodies raised against the polypeptide would recognize and bind the polypeptides, and 
5 would not substantially recognize or bind to other polypeptides. The determination that 
an antibody specifically binds to an antigen is made by any one of a number of standard 
immunoassay methods; for instance, Western blotting, Sambrook et al. (ed.), Molecular 
Cloning: A Laboratory Manual, 2nd ed., vol. 1-3, Cold Spring Harbor Laboratory Press, 
Cold Spring Harbor, N.Y., 1989. 

10 To determine that a given antibody preparation (such as a preparation produced in 

a mouse against SEQ ID NO: 4) specifically detects a polypeptide having the amino acid 
sequence of SEQ ID NO: 4 by Western blotting, total cellular protein is extracted from 
cells and electrophoresed through a sodium dodecyl sulfate (SDS) polyacrylamide gel. 
The proteins are then transferred to a membrane (for example, nitrocellulose) and the 

1 5 antibody preparation is incubated with the membrane. After washing the membrane to 
remove non-specifically bound antibodies, the presence of specifically bound antibodies 
can be detected with anti-mouse antibody conjugated to an enzyme such as alkaline 
phosphatase; application of 5-bromo-4-chloro-3-indolyl phosphate/nitro blue tetrazolium 
results in the production of a densely blue-colored compound by immuno-localized 

20 alkaline phosphatase. 

Isolated polypeptides suitable for use as an immunogen can be isolated from 
transfected cells, transformed cells, or from wild-type cells. Concentration of protein in 
the final preparation is adjusted, for example, by concentration on an Amicon filter 
device, to the level of a few micrograms per milliliter. Polypeptides that range in size 

25 from eight amino acid residues to a full-length polypeptide having enzymatic activity can 
be utilized as an immunogen. Polypeptides that are less than full-length may be 
chemically synthesized using standard methods, or may be obtained by cleavage of the 
whole polypeptide followed by purification of the desired size of polypeptide. 
Polypeptides as short as eight amino acids in length are immunogenic when presented to 

30 an immune system in the context of a Major Histocompatibility Complex (MHC) 

molecule, such as MHC class I or MHC class II. Accordingly, polypeptides comprising 
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at least 8, 10, 20, 25, 30, 35, 40, 45, 50, 55, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 

400, 450, 500, 550, 600, 650, 700, 750, 800, 900, 1000, 1050, 1100, 1 150, 1200, 1250, 

1300, 1350 or more consecutive (contiguous) amino acids of the disclosed amino acid 

sequences may be employed as immunogens for producing antibodies. 
5 Monoclonal antibodies to any of the polypeptides disclosed herein can be 

prepared from murine hybridomas according to the classic method of Kohler & Milstein 

(Nature 256:495 (1975)) or a derivative method thereof. 

Polyclonal antiserum containing antibodies to the heterogeneous epitopes of any 

polypeptide disclosed herein can be prepared by immunizing suitable animals with a 
1 0 polypeptide, which can be unmodified or modified to enhance immunogenicity. An 

effective immunization protocol for rabbits can be found in Vaitukaitis et al. (J. Clin. 

Endocrinol. Metab. 33:988-991 (1971)). 

Antibody fragments can be used in place of whole antibodies and can be readily 

expressed in prokaryotic host cells. Methods of making and using immunologically 
1 5 effective portions of monoclonal antibodies, also referred to as "antibody fragments," are 

well known and include those described in Better & Horowitz (Methods Enzymol. 

178:476-496 (1989)), Glockshuber etal. (Biochemistry 29:1362-1367 (1990), U.S. Pat. 

No. 5,648,237 ("Expression of Functional Antibody Fragments"), U.S. Pat. No. 4,946,778 

("Single Polypeptide Chain Binding Molecules"), U.S. Pat. No. 5,455,030 
20 ("Immunotherapy Using Single Chain Polypeptide Binding Molecules"), and references 

cited therein. 

Hybridization - "Hybridization" is a method of testing for complementarity in the 
base sequence of two nucleic acid molecules from different sources, and is based on the 
ability of complementary single-stranded DNA and /or RNA molecules to form a duplex 

25 molecule. Nucleic acid hybridization techniques can be used to obtain an isolated nucleic 
acid wthin the scope of the invention. Briefly, any nucleic acid having homology to a 
sequence set forth in SEQ ID NOS: 01, 02, 03, 07, 08, 09, 13, 14, 15, 16, 21, 22, and 23 
can be used as a probe to identify a similar nucleic acid by hybridization under conditions 
of moderate to high stringency. Once identified, the nucleic acid then can be purified, 

30 sequenced, and analyzed to determine whether it is wthin the scope of the invention as 
described herein. 



16 



PCT/US01/43906 



Hybridization can be done by Southern or Northern analysis to identify a DNA or 
RNA sequence, respectively, that hybridizes with a nucleic acid of the invention (e.g., a 
probe). The probe can be labeled with a biotin, digoxygenin, an enzyme, or a 
radioisotope such as 32 P. The DNA or RNA to be analyzed can be electrophoretically 
5 separated on an agarose or polyacrylamide gel, transferred to nitrocellulose, nylon, or 
other suitable membrane, and hybridized with the probe using standard techniques well 
known in the art such as those described in sections 7.39-7.52 of Sambrook et al, (1989) 
Molecular Cloning, second edition, Cold Spring Harbor Laboratory, Plainview, NY. 
Typically, a probe is at least about 20 nucleotides in length. For example, a probe 

10 corresponding to a 20 nucleotide sequence set forth in SEQ ID NO: 01 , 02, 03, 07, 08, 09, 
13, 14, 15, 16, 21, 22, and 23 can be used to identify an identical or similar nucleic acid. 
In addition, probes longer or shorter than 20 nucleotides can be used. 

The invention also provides isolated nucleic acid molecules that are at least about 
12 bases in length (e.g., at least about 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, 60, 

15 100, 250, 500, 750, 1000, 1500, 2000, 3000, 4000, or 5000 bases in length) and that 
hybridize, under moderate to highly stringent hybridization conditions, to the sense or 
antisense strand of a nucleic acid having the sequence set forth in SEQ ID NO: 01, 02, 03, 
07, 08, 09, 13, 14, 15, 16, 21, 22, or 23. 

For the purpose of this invention, moderately stringent hybridization conditions 

20 mean the hybridization is performed at about 42°C in a hybridization solution containing 
25 mM KP0 4 (pH 7.4), 5X SSC, 5X Denhart's solution, 50 ug/mL denatured, sonicated 
salmon sperm DNA, 50% forrnamide, 10% Dextran sulfate, and 1-15 ng/mL probe (about 
5xl0 7 cpm/|a.g), while the washes are performed at about 50°C with a wash solution 
containing 2X SSC and 0.1% sodium dodecyl sulfate. 

25 Highly stringent hybridization conditions mean the hybridization is performed at 

about 42°C in a hybridization solution containing 25 mM KP0 4 (pH 7.4), 5X SSC, 5X 
Denhart's solution, 50 ng/mL denatured, sonicated salmon sperm DNA, 50% forrnamide, 
10% Dextran sulfate, and 1-15 ng/mL probe (about 5xl0 7 cpm/ug), while the washes are 
performed at about 65°C with a wash solution containing 0.2X SSC and 0.1% sodium 

30 dodecyl sulfate. 
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Sequence Variants- With the provision of the amino acid sequences set forth in 
SEQ ID NOS: 04, 05, 06, 10, 11, 12, 17, 18, 19, 20, 24, 25, and 26 and the corresponding 
nucleic acid sequences set forth in SEQ ID NOS: 01, 02, 03, 07, 08, 09, 13, 14, 15, 16, 
21, 22, and 23, variants of these sequences can be created. The sequence of these variants 
5 share from about 50% to about 99% sequence identity with the corresponding sequence 
provided in the accompanying sequence listing. In other embodiments, the variants share 
at least 55, 60, 65, 70, 75, 80, 85, 87, 90, 92, 94, 96, or 98% sequence identity with the 
sequences described herein. 

Variant polypeptides sequences include polypeptides that differ in amino acid 

1 0 sequence from the polypeptides sequences disclosed, but that retain biological activity 
(e.g., enzymatic activity). Such polypeptides may be produced by manipulating the 
nucleotide sequence encoding the enzyme using standard procedures such as site-directed 
mutagenesis or the polymerase chain reaction. The simplest modifications involve the 
substitution of one or more amino acids for amino acids having similar biochemical 

15 properties. These so-called "conservative substitutions" are likely to have minimal 
impact on the activity of the resultant polypeptide. Table 2 provides examples of 
conservative substitutions. 



TABLE 2 



Original Residue 


Conservative Substitution(s) 


Arg 


Lys 


Asn 


Gin 


Asp 


Glu 


Cys 


Ser 


Gin 


Asn 


Glu 


Asp 


His 


Asn; Gin 


He 


Leu; Val 


Leu 


He; Val 


Lys 


Arg; Gin; His 


Met 


Leu; He 


Phe 


Met; Leu; Tyr 


Ser 


Thr 


Thr 


Ser 


Trp 


Tyr 


Tyr 


Trp; Phe 
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| Val 1 He; Leu 1 

More substantia] changes in enzymatic function or other features may be obtained 
by selecting substitutions that are less conservative than those in Table 2, i.e., selecting 
residues that differ more significantly in their effect on maintaining: (a) the structure of 
5 the polypeptide backbone in the area of the substitution, for example, as a sheet or helical 
conformation; (b) the charge or hydrophobicity of the molecule at the target site; or 
(c) the bulk of the side chain. The substitutions that in general are expected to produce 
the greatest changes in protein properties will be those in which: (a) a hydrophilic 
residue, e.g., serine or threonine, is substituted for a hydrophobic residue, e.g., leucine, 

10 isoleucine, phenylalanine, valine or alanine, or vice versa; (b) a cysteine or proline is 
substituted for any other residue; (c) a residue having an electropositive side chain, e.g., 
lysine, arginine, or histidine, is substituted for an electronegative residue, e.g., glutamine 
or aspartarnine, or vice versa; or (d) a residue having a bulky side chain, e.g., 
phenylalanine, is substituted for one not having a side chain, e.g., glycine, or vice versa. 

15 The effects of these amino acid substitutions, deletions, or additions can be assessed for 
polypeptides having enzyme activity by analyzing the ability of the polypeptide to 
catalyze the conversion of the same substrate as the related native polypeptide to the same 
product as the related native polypeptide. Accordingly, polypeptide having 5, 10, 20, 30, 
40, 50 or less conservative amino acid substitutions are provided by the invention. 

20 Polypeptides and nucleic acids encoding polypeptides can be produced by 

standard DNA mutagenesis techniques, for example, Ml 3 primer mutagenesis. Details of 
these techniques are provided in Sambrook et al. (ed.), Molecular Cloning: A Laboratory 
Manual 2nd ed., vol. 1-3, Cold Spring Harbor Laboratory Press, Cold Spring, Harbor, 
N.Y., 1989, Ch. 15. By the use of such techniques, variants may be created that differ in 

25 minor ways from the native sequence, yet that still encode a polypeptide having 

enzymatic activity. In their simplest form, such variants may differ from the disclosed 
sequences by alteration of the coding region to fit the codon usage bias of the particular 
organism into which the molecule is to be introduced. 

Alternatively, the coding region may be altered by taking advantage of the 

30 degeneracy of the genetic code to alter the coding sequence in such a way that, while the 
nucleotide sequence is substantially altered, it nevertheless encodes a protein having, an 
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amino acid sequence identical or substantially similar to the disclosed polypeptide 
sequences. For example, the 5th amino acid residue of the SEQ ID NO: 1 8 is alanine. 
This is encoded in the open reading frame (ORF) by the nucleotide codon triplet GCG. 
Because of the degeneracy of the genetic code, three other nucleotide codon triplets- 
5 GCA, GCC, and GCT --'also code for alanine. Thus, the nucleotide sequence of the ORF 
can be changed at this position to any of these three codons without affecting the amino 
acid composition of the encoded protein or the characteristics of the protein. Based upon 
the degeneracy of the genetic code, variant DNA molecules may be derived from the 
cDNA and gene sequences disclosed herein using a standard DNA mutagenesis 

10 techniques as described above, or by synthesis of DNA sequences. Thus, this invention 
also encompasses nucleic acid sequences that encode the polypeptides but that vary from 
the disclosed nucleic acid sequences by virtue of the degeneracy of the genetic code. 

Transformed - A "transformed" cell is a cell into which a nucleic acid molecule 
has been introduced by molecular biology techniques. As used herein, the term 

1 5 "transformation" encompasses all techniques by which a nucleic acid molecule might be 
introduced into such a cell, including, but not restricted to, transfection with a viral 
vector, conjugation, transformation with a plasmid vector, and introduction of naked 
DNA by electroporation, lipofection, particle gun acceleration. 

Nucleic Acid Constructs - Polypeptides of the invention can be produced by 

20 ligating a nucleic acid molecule encoding the polypeptide into a nucleic acid construct 
such as an expression vector, and transforming a bacterial or eukaryotic production cell 
with the expression vector. In general, nucleic acid constructs include expression control 
elements operably linked to a nucleic acid sequence encoding a polypeptide of the 
invention (e.g., lycopene 8 cyclase transferase A, B, or C). Expression control elements 

25 do not typically encode a gene product, but instead affect the expression of the nucleic 
acid sequence. As used herein, "operably linked" refers to connection of the expression 
control elements to the nucleic acid sequence in such a way as to permit expression of the 
nucleic acid sequence. Expression control elements can include, for example, promoter 
sequences, enhancer sequences, response elements, polyadenylation sites, or inducible 

30 elements. 
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In bacterial systems, a strain of E. coli such as DH10B or BL-21 can be used. 
Suitable E. coli vectors include, but are not limited to, pUCl 8, pUC19, the pGEX series 
of vectors that produce fusion proteins with glutathione S-transferase (GST), and 
pBluescript series of vectors. Transformed E. coli are typically grown exponentially then 
5 stimulated with isopropylthiogalactopyranoside (IPTG) prior to harvesting. In general, 
fusion proteins produced from the pGEX series of vectors are soluble and can be purified 
easily from lysed cells by adsorption to glutathione-agarose beads followed by elution in 
the presence of free glutathione. The pGEX vectors are designed to include thrombin or 
factor Xa protease cleavage sites such that the cloned target gene product can be released 

10 from the GST moiety. 

In eukaryotic host cells, a number of viral-based expression systems can be 
utilized to express polypeptides of the invention. A nucleic acid encoding a polypeptide 
of the invention can be cloned into, for example, a baculoviral vector such as pBlueBac 
(Invitrogen, San Diego, CA) and then used to co-transfect insect cells such as Spodoptera 

1 5 frugiperda (Sf9) cells with wild-type DNA from Autographa califomica multiply 
enveloped nuclear polyhedrosis virus (AcMNPV). Recombinant viruses producing 
polypeptides of the invention can be identified by standard methodology. Alternatively, a 
nucleic acid encoding a polypeptide of the invention can be introduced into a SV40, 
retroviral, or vaccinia based viral vector and used to infect suitable host cells. 

20 A polypeptide within the scope of the invention can be "engineered" to contain an 

amino acid sequence that allows the polypeptide to be captured onto an affinity matrix. 
For example, a tag such as c-myc, hemagglutinin, polyhistidine, or Flag™ tag (Kodak) 
can be used to aid polypeptide purification. Such tags can be inserted anywhere within 
the polypeptide including at either the carboxyl or amino termini. Other fusions that 

25 could be useful include enzymes that aid in the detection of the polypeptide, such as 
alkaline phosphatase. 

Agi'obacterium-mzdiated transformation, electroporation and particle gun 
transformation can be used to transform plant cells. Illustrative examples of 
transformation techniques are described in U.S. Patent No. 5,204,253 (particle gun) and 

30 U.S. Patent No. 5, 1 88,958 (Agrobacterium). Transformation methods utilizing the Ti and 
Ri plasmids of Agrobacterium spp. typically use binary type vectors. Walkerpeach, C. et 
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al., in Plant Molecular Biology Manual, S. Gelvin and R. Schilperoort, eds., Kluwer 
Dordrecht, Cl:l-19 (1994). If cell or tissue cultures are used as the recipient tissue for 
transformation, plants can be regenerated from transformed cultures by techniques known 
to those skilled in the art. 
5 Production Cell - a cell that can be cultured such that it produces the carotenoids 

described herein and/or the polypeptides and nucleic acid sequences described herein. 
This includes, without limitation, prokaryotic cells such as R. sphaeroides cells and 
eukaryotic cells such as plant, yeast, and other fungal cells. It is noted that cells 
containing an isolated nucleic acid of the invention are not required to express the isolated 

1 0 nucleic acid. In addition, the isolated nucleic acid can be integrated into the genome of 
the cell or maintained in an episomal state. In other words, cells can be stably or 
transiently transfected with an isolated nucleic acid of the invention. 

Any method can be used to introduce an isolated nucleic acid into a cell. In fact, 
many methods for introducing nucleic acid into a cell, whether in vivo or in vitro, are well 

1 5 known to those skilled in the art. For example, calcium phosphate precipitation, 

conjugation, electroporation, heat shock, lipofection, microinjection, and viral-mediated 
nucleic acid transfer are common methods that can be used to introduce nucleic acid 
molecules into a cell. In addition, naked DNA can be delivered directly to cells in vivo as 
describe elsewhere (U.S. Pat. Nos. 5,580,859 and 5,589,466). Furthermore, nucleic acid 

20 can be introduced into cells by generating transgenic animals. 

Any method can be used to identify cells that contain an isolated nucleic acid 
within the scope of the invention. For example, PCR and nucleic acid hybridization 
techniques such as Northern and Southern analysis can be used. In some cases, 
immunohistochemistry and biochemical techniques can be used to determine if a cell 

25 contains a particular nucleic acid by detecting the expression of a polypeptide encoded by 
that particular nucleic acid. For example, the polypeptide of interest can be detected with 
an antibody having specific binding affinity for that polypeptide, which indicates that that 
cell not only contains the introduced nucleic acid but also expresses the encoded 
polypeptide. Enzymatic activities of the polypeptide of interest also can be detected or an 

30 end product (e.g., a particular carotenoid) can be detected as an indication that the cell 
contains the introduced nucleic acid and expresses the encoded polypeptide from that 
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introduced nucleic acid. 

The cells described herein can contain a single copy, or multiple copies (e.g., 
about 5, 10, 20, 35, 50, 75, 100 or 150 copies), of a particular exogenous nucleic acid. 
For example, a bacterial cell (e.g., Rhodobacter) can contain about 50 copies of an 
5 exogenous nucleic acid of the invention. In addition, the cells described herein can 
contain more than one particular exogenous nucleic acid. For example, a bacterial cell 
can contain about 50 copies of exogenous nucleic acid X as well as about 75 copies of 
exogenous nucleic acid Y. In these cases, each different nucleic acid can encode a 
different polypeptide having its own unique enzymatic activity. For example, a bacterial 

10 cell can contain two different exogenous nucleic acids such that a high level of a 

carotenoid is produced. In addition, a single exogenous nucleic acid can encode one or 
more polypeptides. For example, a single nucleic acid can contain sequences that encode 
three or more different polypeptides. 

Microorganisms that are suitable for producing carotenoids may or may not 

1 5 naturally produce carotenoids, and include prokaryotic and eukaryotic microorganisms, 
such as bacteria, yeast, and fungi. In particular, yeast such as Phaffia rhodozyma 
(Xanthophyllomyces dendrorhous), Candida utilis, and Saccharomyces cerevisiae, fungi 
such as Neurospora crassa, Phycomyces blakesleeanus, Blakeslea trispora, and 
Aspergillus sp, Archaea bacteria such as Halobacterium salinarium, and Eubacteria 

20 including Pantoea species (formerly called Erwinia) such as Pantoea stewartii (e.g., 
ATCC Accession #8200), flavobacteria species such as Xanthobacter autotrophic™ and 
Flavobacterium multivorum, Zymonomonas mobilis, Rhodobacter species such as R. 
sphaeroides and R. capsulatus, E. coli, and E. vulneris can be used. Other examples of 
bacteria that may be used include bacteria in the genus Sphingomonas and Gram negative 

25 bacteria in the a-subdivision, including, for example, Paracoccus, Azotobacter, 

Agrobacterium, and Erythrobacter. Eubacteria, and especially R. sphaeroides and J?. 
capsulatus, are particularly useful. R. sphaeroides and R, capsulatus naturally produce 
certain carotenoids and grows on defined media. Such Rhodobacter species also are non- 
pyrogenic, minimizing health concerns about use in nutritional supplements. 

30 Streptomyces aeriouvifer, Bacillus subtilis, and Staphylococcus aureus also are suitable 
production cells. In some embodiments, it can be useful to produce carotenoids in plants 
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and algae such as Haematococcus pluvialis, Dunaliella salina, Chlorella protothecoides, 
Zea mays, Brassica napus, Arabidopsis thaliana, Tagetes erecta, Lycopersicum 
esculentum, and Neospongiococcum excentrum. 

It is noted that bacteria can be membranous or non-membranous bacteria. The 
5 term "membranous bacteria" as used herein refers to any natoally-occurring, genetically 
modified, or environmentally modified bacteria having an intracytoplasmic membrane. 
An intracytoplasmic membrane can be organized in a variety of ways including, without 
limitation, vesicles, tubules, thylakoid-like membrane sacs, and highly organized 
membrane stacks. Any method can be used to analyze bacteria for the presence of 

10 intracytoplasmic membranes including, without limitation, electron microscopy, light 
. microscopy, and density gradients. See, e.g., Chory et al, (1984) J. Bacteriol., 159:540- 
554; Niederman and Gibson, Isolation and Physiochemical Properties of Membranes 
from Purple Photosynthetic Bacteria. In: The Photosynthetic Bacteria, Ed. By Roderick 
K. Clayton and William R. Sistrom, Plenum Press, pp. 79-1 1 8 (1978); and Lueking et al., 

15 (1978) J. Biol. Chem.. 253: 451-457. 

Examples of membranous bacteria that can be used include, without limitation, 
Purple Non-Sulfur Bacteria, including bacteria of the Rhodospirillaceae family such as 
those in the genus Rhodobacter (e.g., if. sphaeroides and R. capsulatus), the genus 
Rhodospirillum, the genus Rhodopseudomonas, the genus Rhodomicrobium, and the 

20 genus Rhodopila. The term "non-membranous bacteria" refers to any bacteria lacking 
intracytoplasmic membrane. Membranous bacteria can be highly membranous bacteria. 
The term "highly membranous bacteria" as used herein refers to any bacterium having 
more intracytoplasmic membrane than R. sphaeroides (ATCC 17023) cells have after the 
R. sphaeroides (ATCC 17023) cells have been (1) cultured chemoheterotrophically under 

25 aerobic condition for four days, (2) cultured chemoheterotrophically under anaerobic for 
four hours, and (3) harvested. Aerobic culture conditions include culturing the cells in the 
dark at 30°C in the presence of 25% oxygen. Anaerobic culture conditions include 
culturing the cells in the light at 30°C in the presence of 2% oxygen. After the four hour 
anaerobic culturing step, the R. sphaeroides (ATCC 17023) cells are harvested by 

30 centrifugation and analyzed. 
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II. Brief Overview 

The present invention involves the identification, isolation, and cloning of genes 
involved in a non-mevalonate pathway for carotenoid biosynthesis. In particular, the 
5 isolated genes allow for the biosynthesis of a C40 carotenoid and the conversion of the 
C40 carotenoid to a C50 carotenoid. The isolated genes can be introduced into a 
production cell. The production cell can be used to produce the polypeptides for use in 
vitro (outside of the cell) or the production cell can be used to make C>40 carotenoids, 
such as C50 carotenoids and various derivatives. 

1 0 The identification of one set of representative genes allows for the isolation of 

genes that have similar nucleic acid and/or amino acid sequences, which have a similar 
function. The isolated genes offer an advance in the art, because they allow for the 
conversion of a C40 carotenoid to a O40 carotenoid, such as a C50 carotenoid. 

The nucleic acid sequences provided herein encode three separate polypeptides. 

15 An important finding of the invention is that the activity of all three polypeptides can be 
used to convert a C40 carotenoid to the C50 carotenoid. The nucleic acid molecules were 
first isolated from A. mediolanus. Similar genes with substantial homology were then 
isolated from M. luteus. The genes from M. luteus were also shown to be active. It is 
believed that other similar genes with substantial homology could be isolated from other 

20 bacteria using similar techniques, and that such genes fall within the present invention. 

The present invention is particularly important because it provides a key step to 
the ability to convert carotenoids from the C40 level to the C50 level by genetic 
manipulation. 

The invention uses standard laboratory practices, such as for the cloning, 
25 manipulation, and sequencing of nucleic acids, purification and analysis of proteins and 
other molecular biological and biochemical techniques, unless otherwise specified. Such 
standard techniques are explained in detail in standard laboratory manuals such as 
Sambrook et al, Molecular Cloning: A Laboratory Manual, 2nd edition., vol. 1-3, Cold 
Spring Harbor, New York, 1989; and Ausubel et al, Current Protocols in Molecular 
30 Biology, Greene Publ. Assoc. & Wiley-Intersciences, 1989. 
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III. Experimental Materials, Methods, Results, and Examples— Agromyces 
mediolanus 

Brief outline of the subject matter described in section III 

5 

1. The selection of A. mediolanus as the bacterium for which genomic DNA 
would be extracted. 

2. The construction of a genomic DNA library, the isolation of genomic colonies, 
and the selection of experimental working colonies. A particularly important 

10 experimental working colony was called Yl . 

3. The isolation of aplasmid DNA from the Yl colony, and the identification of a 
carotenogenic operon contained therein. 

4. The sequencing and sequence analysis of the carotenogenic operon. 

5. The identification of seven (7) genes (idi, crtE, crtB, crtl, IctA (ORF XI), IctB 
1 5 (ORF X2), and IctC (ORF Y) from the operon, wherein one or more of the seven (7) 

isolated genes allow for the biosynthesis of the C50 carotenoid and the conversion of a 
C40 carotenoid to a O40 carotenoid, such as a C50 carotenoid. The identification 
included, among other aspects, the determination of the respective nucleic acid sequences 
and encoded amino acid sequences. 

20 6. The creation of constructs of certain combinations of the seven genes. The 

constructs were amplified with primers and PCR. Deductive analysis was performed on 
the amplified constructs to determine the capabilities of individual constructs. The 
pathway of the associated biosynthetic reactions was determined. Hie portion of the 
pathway associated with individual genes was also determined. 

25 7. The recognition that four of the previously unidentified genes (4) (idi, crtE, 

crtB, crtl) of the seven (7) isolated genes allow for the production of a C40 carotenoid, in 
a manner having certain similarities to techniques already known it the art. 

8. The realization that three (3) (IctA, IctB, IctQ of the seven (7) isolated genes 
represented a significant advance to the art, because the genes allow for the conversion of 

30 a C40 carotenoid to a O40 carotenoid, such as a C50 carotenoid. 

9. The realization that the activities that are provided by the three (3) genes (IctA, 
IctB, IctQ can be used to convert a C40 carotenoid to a C50 carotenoid in a single step. 
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1 0. The cloning of certain constructs of the seven (7) isolated genes into host 
bacteria, which resulted in successful carotenogenic reactions. 

Details elaborating the brief outline are described in the remainder of section III. 

5 A. Selection of Agromyces mediolanus; Agromyces mediolanus genomic DNA 
preparation 

Flavobacterium dehydrogenans was chosen as the bacterial source for the 
identification of genes since the bacterium had been reported to produce both C40 and 

10 C50 carotenoids (Weeks OB et al Nature 224:879-82, 1969). Since F. dehydrogenans 
was an unidentified bacterium in the ATCC (American Type Culture Collection), the 
strain was submitted for identification. Microbial identification revealed the organism to 
be Agromyces mediolanus. Although there were reports in the literature describing the 
production of the C50 carotenoid decaprenoxanthin in (F. dehydrogenans) A. mediolanus 

15 (Schwieter U, and Liaaen- Jensen S. Acta Chem Scand 23:1057, 1969, and Liaaen- Jensen 
S, et al Acta Chem Scand 22: 1 1 7 1 -86, 1 968), no reports were found on the genes 
responsible for C50 carotenoid biosynthesis. 

A. mediolanus was grown in 200 mL of nutrient broth for 36 hours at 30°C and 
250 rpm. Cultured cells were centrifuged to form a cell pellet, and washed by 

20 resuspending the pellet in a 10 mM Tris: 1 mM EDTA (ethylene diaminetetraacetate) 

solution, and centrifuged again. The cell pellets were resuspended in 5 mL of GTE buffer 
(50 mM glucose, 25 mM Tris HC1, pH 8.0, 10 mM EDTA, pH 8.0) per 100 mL of 
culture. The bacterial cell walls were lysed by adding lysozyme and Proteinase K, each 
to a 1.0 mg/mL final concentration, and mutanolysin to a 5.5 ug/mL final concentration. 

25 After a 1 .5 hours incubation at 37°C, SDS (sodium dodecyl sulfate) was added to a final 
concentration of 1% and the concentration of Proteinase K was brought to 2 mg/mL. 
After incubation at 50°C for one hour, the solution containing the lysed cells was diluted 
1 : 1 with fresh GTE buffer and NaCl was added to a 0. 1 5 M concentration in the diluted 
solution. The mixture was extracted with an equal volume of phenol:chloroform:isoamyl 

30 alcohol (25 :24:1) and centrifuged at 12,000 x g for 10 minutes. The supernatant was 
removed and placed in a clean tube, extracted with an equal volume of chloroform, and 
centrifuged at 3,000 x g for 10 minutes. The supernatant was treated with RNase and 
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precipitated with 2.5 volumes of ethanol. After mixing the solution, the precipitated 
DNA was removed by spooling it on a glass rod. The spooled DNA was washed with 
70% ethanol, air dried, and resuspended in 10 mM Tris, pH 8.5. 

5 B. A. mediolanus genomic DNA library construction for isolation of the carotenoid 
operon 

A. mediolanus genomic DNA (80 ug) was digested at 37°C for 10 minutes with 
2.8 units of Sau3A I restriction enzyme (Promega, Madison, WI). The digested DNA was 

1 0 separated by gel electrophoresis using a 0.8% Tris-acetate-EDTA (TAE) agarose gel. 
DNA fragments ranging from 7-10 Kb in size were excised and purified using a Qiagen 
Gel Purification kit (Qiagen Inc., Valencia, CA). Vector to be used in the ligation 
(pUC19) was prepared by digesting with BamHl restriction enzyme (New England 
Biolabs, Inc., Beverly, MA), gel purifying, and dephosphorylating using shrimp alkaline 

1 5 phosphatase (Roche Molecular Biochemicals, Indianapolis, IN). BamH I DNA fragments 
(126 ng) were ligated into 50 ng of prepared pUC19 DNA at 14°C for 16 hours using T4 
DNA ligase (Roche Molecular Biochemicals). The ligation reaction was precipitated by 
adding 1/10 volume 7.5 M NH4OAC and 2.5 volumes ethanol, incubating at -20°C for 3 
hours, centrifuging to obtain a DNA pellet, washing the pellet with 70% ethanol, drying 

20 the pellet, and resuspending the pellet in 20 uL of 10 mM Tris buffer, pH 8.5. One 
microliter of ligation reaction was used to electroporate 40 uL of ElectroMAX™ 
DH1 0B™ competent cells (Life Technologies, Inc., Rockville, MD). Electroporated cells 
were recovered in SOC media and plated on LB plates containing 1 00 u.g/mL of 
ampicillin (LBA). The plating volume necessary to produce approximately 300 

25 cells/plate was deterrriined by plating various volumes of transformed cells. Using this 
information, 125 plates containing approximately 300 colonies each were plated from 
transformations using remains of the ligation reaction. Plates were incubated at 37°C for 
one day and then at room temperature for one day. On the second day, one yellow colony 
(Yl) was identified and streaked to a new LBA plate. Plasmid DNA of this colony was 

30 isolated using a Qiaprep Spin Miniprep Kit (Qiagen, Inc.). EcoR I restriction digests 

(New England Biolabs, Inc.) of the plasmid DNA showed the plasmid to contain an insert 
approximately 9-Kb in size. 
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C. Subcloning and sequencing of the A mediolanus carotenogenic operon 

Several restriction enzymes, including BamHlaad Pst I, were used to digest 2 ug 
5 aliquots of plasmid DNA from the Yl colony. A digest from BamHl produced two 
fragments approximately 9 Kb and 3 Kb in size and a digest from Pst I produced four 
fragments approximately 4.5, 3.0, 1.5, and 1.0 Kb in size. These fragments were gel 
purified, ligated into pUC19, and transformed into ElectroMAX™ DH10B™ competent 
cells as described above. The electroporated cells were plated on LB agar plates with 100 

1 0 ng/mL of ampicillin and 50 ug/mLof 5-Bromo-4-Chloro-3-Indolyl-p-D- 

Galactopyranoside (Xgal, media =LBAX). Single, white colonies corresponding to each 
purified fragment were isolated. Plasmid DNA was isolated and used to obtain the DNA 
sequence of each insert, using either M13F and M13R vector primers or sequencing 
primers designed from internal DNA sequence. Individual sequences were aligned using 

15 the software Clone Manager and Align Plus (Scientific and Educational Software, 
Durham, NC). 

D. Sequence analysis of the A. mediolanus carotenogenic operon 

20 The BLAST DNA sequence comparison program (National Center for 

Biotechnology Information) was used to identify genes residing on the insert of the Yl 
clone. The sequence of nucleotides residing on the insert of the Yl clone was chosen as a 
working operon (the Yl operon), and the location of the genes residing on the Yl operon 
is shown in FIG 1 . The BLAST analysis identified the following genes, in order of 

25 location in the operon: 

* idi, isopentenyl pyrophosphate isomerase, 

* crtE, geranylgeranyl pyrophosphate synthase (CCPS synthese), 

* crtB, phytoene synthase, and 

* crtl, phytoene dehydrogenase (phytoene desaturase). 

30 
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In addition, three open reading frames (ORFs) downstream of crtlwere identified 
to which no definitive function could be assigned using sequence similarity. The three 
ORFs were given the following names: 

* ORFX1— the first ORF downstream of crtf— was 3 72 nucleotides in 

5 length 

* ORFX2— the second ORF downstream of crtl—was 348 nucleotides in 

length 

* ORFY — the third ORF downstream of crtl— was 897 nucleotide in length 

1 0 ORFX1 showed homology (33% sequence identity) to the lycopene cyclase 

domain of the Rhizomucor carRP gene. The carRP gene encodes a polypeptide having 
both phytoene synthase and lycopene cyclase activities. Therefore, it is likely that the 
polypeptide encoded by the ORFX1 gene contributes cyclase activity during the 
conversion of lycopene to decaprenoxanthin. 

1 5 No genes with significant homology were detected for ORFX2 in the Genbank 

database. The ORFY protein sequence had low homology with a DHNA- 
octaprenyltransferase from Bacillus subtilis in the Swisspro database. This enzyme 
catalyzes the attachment of a 40-carbon side chain to l,4-dihydroxy-2-naphthoic acid 
(DHNA). BLAST searches of the ORFY DNA sequence to the NCBI non-redundant 

20 DNA database showed certain homology to ORFs identified in Deinococcus radiodurans, 
Halobacterium sp. NRC-1 (National Research Council of Canada, a cell repository), and 
Methanobacterium thermoautotrophicum. The Deinococcus radiodurans ORF in turn 
shows low homology to a Schizosaccharomyces pombe para-hydroxybenzoate 
polyprenyltransferase. The Halobacterium ORF shows significant homology to a 

25 Rhodobacter capsulatus bacteriochlorophyll synthase gene, which catalyzes the 

esterification of bacteriochlorophyll by geranylgeranyl-pyrophosphate, and low homology 
to a Saccharomyces cerevisiae para-hydroxybenzoate polyprenyltransferase. 
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E. A. mediolanus DNA constructs for carotenoid production 
1. The constructs and carotenoid production 

Initial data indicated that the inclusion of the idi gene in an expression vector was 
likely necessary to achieve detectable carotenoid expression levels. The initial 
experiments also indicated that the use of a medium copy number vector was preferable 
to use of a high copy number vector, possibly due to a detrimental effect on the bacterial 
cell of maintaining the latter. Therefore, the expression vector pProLarNde was used. 
This vector is a modification of the pPROLar.A vector (CLONTECH Laboratories, Inc., 
Palo Alto, CA) into which an Nde I restriction site was inserted downstream of the 
ribosomal binding site. 

Primers were designed to amplify three regions of the Yl operon: (a) the region 
from idi through crtl— the idi-crtl construct (4.6 KB), (b) the region from idi through 
ORFX2— the fcfi-ORFX2 construct (5.3 KB), and (c) the region from idi through 
ORFY— the idi-ORFY construct (6.7 Kb). These primers were designed to introduce an 
Nde I restriction site at the beginning of the amplified fragment and a Hind III restriction 
site at the end of the amplified fragment. The sequences of the primers were as follows, 
with the restriction sites underlined: 
Primer name Primer sequence 

AIDINDEF S'-T TCATATGT CACTAGCCAGGCGAGATATCC-3 ' (SEQ ID NO: 27) 
APDHIIIR 5 ' -G AA AGCTTA AGAAGATGCCGAGCGAGATG-3 ' (SEQ ID NO: 28) 
AXHIIIR 5 '-AGAAGCTTTGTACGGCACGAGGAAGAACAG-3 ' (SEQ ID NO: 
29) 

AYHIHR 5 '-G AAAGCTT CTCCGTGACGAGATCCTGAG-3 ' (SEQ ID NO: 30) 



25 Due to the high GC content of A. mediolanus, PCR was conducted using the 

Advantaged^-GC Genomic Polymerase (CLONTECH) kit. The PCR reaction mix, 
according to manufacturer's specifications, used a 1.0 M final GC-Melt concentration and 
1.0 ng of A. mediolanus genomic DNA per uL of reaction mix in a 100-200 uL reaction. 
The PCR reactions were performed in a Perkin Elmer Geneamp system 2400 under the 

30 following conditions: (a) an initial denaturation at 94°C for 45 seconds; (b) 8 cycles of 
(1) 94°C for 25 seconds, (2) 56°C for 1 minute, and (3) 72°C for 10 minutes; (c) 25 
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cycles of (1) 94°C for 25 seconds, (2) 60°C for 1 minute, and (3) 72°C for 10 minutes; 
and (d) a final extension of 72°C for 10 minutes. The PCR reactions were subjected to 
gel electrophoresis using a 0.8 % TAE agarose gel. Fragments of the expected sizes were 
gel purified as previously described. Purified DNA was digested overnight with Hind III 
5 and Nde I to make the fragment ends compatible with digested pPROLarNde vector. The 
digested PCR product was purified using a Qiagen PCR Purification column and 
quantified on a spectrophotometer. 

pPROLarNde vector (5 u.g) was digested overnight with Hind III and Nde I and 
purified using gel electrophoresis on a 1% TAE agarose gel and a Qiagen Gel Purification 

1 0 Kit. The digested and purified vector was dephosphorylated using calf intestinal alkaline 
phosphatase (CIAP, Promega) according to manufacturer's specifications with the 
following exceptions: (a) 40 uL of eluent from the Qiagen purification was used directly 
as the starting DNA, (b) the CIAP was used at a 1/20 enzyme dilution rather than a 1/100 
dilution, and (c) the dephosphorylated DNA was purified using a Qiagen PCR 

1 5 Purification Column rather than by ethanol precipitation. 

The purified and digested PCR products were each ligated into 50 ng of prepared 
pPROLarNde DNA at 16°C for 16 hours using T4 DNA ligase (Roche Molecular 
Biochemicals). One uL of each ligation reaction was used to electroporate 40 uL of 
ElectroMAX™ DH10B™ competent cells. Electroporated cells were recovered in SOC 

20 media for one hour and plated on LB plates containing 50 ug/mL of kanamycin, 1 mM 
isopropylthio-P-D-galactoside (IPTG), and 2% L-arabinose (LBKIA). 

Two red colonies were isolated from E. coli transformed with the idi-crtl 
construct; two red colonies were isolated from E. coli transformed with the /<#-ORFX2 
construct; one yellow colony was isolated from E. coli transformed with the /di'-ORFY 

25 construct. Each of these colonies had the desired insert size, as indicated by PCR and by 
restriction enzyme digest with Hind III and Nde I. DNA sequencing of the X1-X2-Y 
region was conducted on plasmid DNA from these colonies to check for PCR errors. 

Carotenoids were extracted from 100 mL cultures grown for 3 days in LBKIA 
media at 30°C and 200 rpm. Cells were pelleted by centrifugation at 12,000 g for 10 

30 minutes, washed with sterile distilled water, and re-centrifuged. The pellet was dried and 
resuspended in 2 mL of acetone by vortexing in the presence of glass beads. The 
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extraction of the carotenoids was performed at 55°C for atotal of 1.5 hours and at room 
temperature for one hour. Extractions were conducted in the dark to prevent light- 
induced degradation of carotenoids, and with vortexing every 1 5 minutes to enhance cell 
exposure to the solvent. The extraction mixture was then centrifuged at 27,00 g for 1 5 
5 minutes to obtain a hard pellet of cell matter. The supernatant of the carotenoids was 
passed through a 0.2 micron filter and the absorption curve from 400-600 nm was read on 
a Cary 100 spectrophotometer. 

HPLC analysis of the carotenoid extracts from various clones is shown in FIG 2 
and FIG 3. It is significant that the C50 carotenoid extracted from the E. coli clone with 

10 the idi-YA. mediolanus fragment showed a mass that was identical to that observed in 
A. mediolanus wild type extract (FIG 4). Absorption curves showed that the carotenoid 
material produced from E. coli containing the idi-crtl construct and the carotenoid 
material produced from E. coli containing the idi-ORFX2 construct have a spectrum 
identical to that of lycopene (a C40 carotenoid) (FIG 5). HPLC analysis of the extracts 

1 5 and mass spectrometric analysis confirmed these observations (FIG 7). 

The carotenoid material produced from the idi-QRFY construct exhibited a 
spectrum that appeared to be a mixture of carotenoids, including both lycopene (FIG 6) 
and the C50 carotenoid produced by the original Yl clone (FIG 3B). 



20 2. The relationship of ORFX1, ORFX2, and ORFY to the production of the C50 
carotenoid 

The production of the C50 carotenoid by the E. coli clone having the zcft-ORFY 
construct and lack of production by the clone having the z<#-ORFX2 construct indicate 

25 that ORFY was necessary for production of the Yl C50 carotenoid. To help determine 
whether the XI and X2 ORFs were also necessary for production of the C50 carotenoid, 
the following strategies were employed: 

The first strategy is detailed in Example 1, and it involved cloning ORFY into the 
*'<#-crtf/pPROLarNde construct to determine if the C50 carotenoid could be produced in 

30 the absence of the XI and X2 ORFs. Primers for the amplification of ORFY were 

designed to introduce a Pac I restriction site at the beginning of the amplified fragment 
and an Xba I restriction site at the end of the amplified fragment, which would insert the 
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ORFY fragment downstream of the idi-crtl genes. The sequences of the primers were as 
follows, with the restriction sites underlined: 

AYPACF 5 ' -GTC TTAATTAA CTGCTGCTCTGCTCCACGGTCT -3' (SEQ ID NO: 31) 
5 AYXBAR 5 ' -T ATCTAGACGCTCCGTGACGAGATCCTGAG -3' (SEQ ID NO: 32) 

The PCR reaction mix contained IX Pfu buffer, 0.2 mM each dNTP, 5% dimethyl 
sulfoxide (DMSO), 0.5 uM each primer, 10 units of Pfu DNA polymerase (Stratagene) 
and 200 ng of A. mediolanus genomic DNA in a 200 uL reaction. The PCR reactions 

1 0 were performed in a Perkin Elmer Geneamp system 2400 under the following conditions: 
an initial denaturation at 94°C for 1 minute, 8 cycles of (1) 94°C for 30 seconds, (2) 57°C 
for 45 seconds, and (3) 72°C for 3.5 minutes; 25 cycles of (1) 94°C for 30 seconds, (2) 
62°C for 45 seconds, and (3) 72°C for 3.5 minutes; and a final extension of 72°C for 7 
minutes. The PCR reactions were subjected to gel electrophoresis using a 1 .0 % TAE 

15 agarose gel. A fragment of the expected size was gel purified as previously described. 
Purified DNA was digested overnight with Pac I, purified using a Qiagen PCR 
purification column, digested for 3.5 hours with Nde I restriction enzyme, purified with a 
Qiagen PCR purification column, and eluted in 30 uL of 10 mM Tris. 

The idi-crtl construct was similarly digested with Pac I and Xba I, 

20 dephosphorylated with shrimp alkaline phosphatase (Roche, Basil, Switzerland), and gel 
purified. Eighty jug of the digested and purified idi-crtl construct was ligated with 120 ng 
of the ORFY product using T4 DNA ligase at 16°C for 16 hours. A control ligation with 
no insert DNA was also performed. One microliter of each ligation reaction was used to 
transform E. Coli ElectroMAX™ DH10B™ competent cells. The transformation 

25 reactions were recovered in 300 uL of SOC media for 1 hour and plated on both LB 

media with 50 ug/mL kanamycin (LBK) and LBKIA media. Several colonies that grew 
on the LBK plates were patched to LBKIA plates. Plasmid DNA was isolated from 
single colonies and shown to have the desired insert size through digestion with Xba I 
restriction enzyme. 

30 The second strategy used a two- vector system. ORFY was cloned into the 

Sph VXba I sites of pUC19 and used in double transformations with the idi- 
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m//pPROLarNde vector. Plasmid DNA was isolated from single colonies and digested 
mihXba I and mXba \ISph I mix to check the insert size. Electrocompetent cells of 
E. coli strain DH5aPRO (CLONTECH) were transformed with both the idi- 
crtf/pPROLarNde vector and the ORFY/pUC19 vector in a 5: 1 ratio due to a lower 
5 transformation rate of the first vector. Cells were recovered in SOC media for 1 hour and 
plated on LB media containing 100 ug/mL arnpicillin and 50 ug/Ml kanamycin (LBAK) 
and LBKIA media with 100 ug/mL arnpicillin (LBAKIA). Single colonies were patched 
to new LBAKIA plates. All resulting colonies were red in color. Plasmid DNA was 
isolated from double transformants and digested with^Y&a I to check the size of both 

10 plasmids. Carotenoids were extracted from the clones and identified as lycopene (a C40 
carotenoid) on the basis of the visible spectral profile. 

The experiments described in the first and second strategies indicate that the idi- 
crtl construct with the addition of ORF Y— but without ORFX1 and ORFX2 — can 
produce C40 carotenoids but did not produce C50 carotenoids. 

1 5 The third strategy is detailed in Example 3 and involves site-directed mutagenesis 

to introduce frameshift mutations individually in ORFX1, ORFX2, and ORFY to help 
determine if the XI and X2 ORFs were needed for production of the Yl C50 carotenoid. 
A plasmid containing the XI, X2, and Y ORFs in pUC19 was constructed as follows and 
used as template for mutagenic PCR. The QuikChange™ Site-Directed Mutagenesis Kit 

20 (Stratagene, La Jolla, CA) was then used to produce a vector containing a mutation in 
ORFX1, a vector with a mutation in ORFX2, and a vector containing a mutation in 
ORFY. Primers were designed to amplify the region of A. mediolanus genomic DNA 
containing the XI, X2, and Y ORFs. These primers were designed to introduce an Sph I 
restriction site at the begimiing of the amplified fragment and anXbct I restriction site at 

25 the end of the amplified fragment. The sequences of the primers were as follows, with 
the restriction sites underlined: 



AXSPHF 5 ' -TAG GC ATGC AACGTCGAGGGGCTGTACTTC -3' (SEQ ID NO: 33) 
AYXBAR 5 ' -TATC TAGA CGCTCCGTGACGAGATCCTGAG -3' (SEQ ID NO: 32) 

30 
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As part of the third strategy, the non-mutated ORFX1, ORFX2, ORFY fragment 
was combined with an idi-crtl fragment. This was done using PCR conducted using the 
Advantage®-GC Genomic Polymerase (CLONTECH) Kit. The PCR reaction mix was 
according to manufacturer's specifications, using a 1.0 M final GC-Melt concentration 
5 and 1 .0 ng of A. mediolanus genomic DNA per ul of reaction mix in a 100-200 uL 
reaction. The PCR reactions were performed in a Perkin Elmer Geneamp system 2400 
under the following conditions: an initial denaturation at 94°C for 1 minute, 8 cycles of 
(1) 94°C for 30 seconds, (2) 56°C for 45 seconds, and (3) 72°C for 3.75 minutes; 25 
cycles of (1) 94°C for 30 seconds, (2) 60°C for 45 seconds, and (3) 72°C for 3.75 

10 minutes; and a final extension of 72°C for 7 minutes. The PCR reactions were subjected 
to gel electrophoresis using a 1.0 % TAE agarose gel. Fragments of the expected size 
were gel purified as previously described. Purified DNA was digested overnight with 
Xba I and Sph I restriction enzymes to make the fragment ends compatible with digested 
vector and purified using a Qiagen PCR Purification column. 

1 5 The pUC 1 9 vector was digested with Sph I and Xba I, gel purified, and 

dephosphorylated as described previously. The digested and purified vector (65 ng) was 
ligated with 360 ng of the X1X2Y insert using T4 DNA ligase at 16°C for 16 hours. A 
control ligation with no insert DNA was also performed. One microliter of each ligation 
reaction was used to transform E. coli ElectroMAX™ DH10B™ competent cells. The 

20 transformation reaction was recovered in 300 uL of SOC media for 1 hour and plated on 
LBAX media. Single, white colonies were screened by PCR to determine if they 
contained the desired insert. Plasmid DNA was isolated from seven colonies positive for 
the insert. Equal amounts of DNA of each of the seven plasmids was pooled. 25 ng of 
the pooled XlX2Y/pUCl9 plasmid DNA and 100 ng of idi-crtl plasmid DNA were 

25 transformed into electrocompetent cells of the B. coli strain DH5aPRO. Cells were 
recovered for 1 hour in SOC media and plated on LBAK and LBAKIA media. The 
resulting colonies were either yellow or red, with red colonies presumably resulting from 
errors in DNA replication during PCR of the X1X2Y fragment. Plasmid DNA was 
isolated for three yellow colonies and exhibited the desired inserts upon digestion with 

30 Xba I. Carotenoid extractions on these three cultures showed that they were producing 
the C50 carotenoid of the original Yl clone. Thus, the non-mutated ORFX1, ORFX2, 
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ORFY fragment combined with the idi-crtl fragment was capable of producing a C50 
carotenoid when introduced into E. coli. 

As another part of the third strategy, mutated ORFX1, ORFX2, and ORFY 
fragments were individually combined with an idi-crtl fragment. 
5 The following primers were used in mutagenesis: 

X1A 5'-GCTCGTCGACGCGCGCTAGCCGGCTGTTCTTCTGG -3' (SEQ ID NO: 34) 
XI B 5'-CCAGAAGAACAGCCGGCTAGCGCGCGTCGACGAGC -3' (SEQ ID NO: 
35) 

The underlined base was inserted, causing a frameshift mutation and creating a 
10 unique Nhe I site in the plasmid. 

In addition, a C nucleotide and a G nucleotide were deleted, respectively, from the 
spaces in the X2A primer and a C nucleotide and a G nucleotide were deleted, 
respectively, from the spaces in the X2B primer. The first mutation introduced a 
frameshift and a unique Nhe I site, while the second mutation eliminated a potential 
15 translational start codon. 

X2A 5'-GGAACGGGAGGCAGAGCA GGC TAGCTC ATCGGCGGGCCCTTCG-3 ' 
(SEQ ID NO: 36) 

X2B 5'-GGGCCCGCCGATGAGCTA GCC TGCTCTGCCTCCCGTTCC-3' (SEQ ID 
20 NO: 37) 

A G nucleotide was deleted from the space in the YA primer and a C was deleted 
from the space in the YB primer, in order to create a frameshift and a unique Nhe I site. 

25 YA 5 ' -GTGTTGATCC AGCT AGCGGGCGCGATGCGGTGAAG-3 ' (SEQ ID NO: 38) 
YB 5'-TTCACCGCATCGCGCCCGCT AGCTGGATCAACACC -3 5 (SEQ ID NO: 39) 

Mutagenic PCR was conducted using CLONTECH's Genome Advantage 5X 
Buffer, 1.0 M GCMelt, 1.1 mM MgOAc, 0.2 mM each dNTP, 15 ng of template DNA, 
30 and 2.5 units oiPJu Turbo DNA polymerase (Stratagene,) in a 50 ul reaction. Plasmid 
DNA of the X1X2 /pUC19 construct, described above, was used as template. PCR was 
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conducted according to the manufacturer's specification in the QuikChange™ Site- 
Directed Mutagenesis Kit, using a 14 minute extension time and 18 cycles of PCR. Dpn I 
treatment and transformation were conducted as per manufacturer's specifications except 
that 2 ul of Dpn I-treated DNA was used in each transformation and cells were recovered 
5 in SOC media for 0.5 hour. Cells were plated on LBA plates and plasmid DNA was 
isolated from ten single colonies of each mutant type. Plasmid DNA of each colony was 
digested with Nhe I restriction enzyme to check for the introduction of a Nhe I site 
introduced through the mutagenic primer. All but one colony had a single Nhe I site, 
compared to the lack of a site in the XlX2Y/pUCl 9 template plasmid. The presence of 

10 the desired mutations and lack of unwanted mutations in other ORFs (i.e., an unwanted 
mutation in the Y ORF in the XI mutation vector), were confirmed by sequencing. 
Plasmid DNA from two mutant colonies for the XI mutation and one mutant colony for 
the X2 and Y mutations were used, along with the zWz'-crtf/pPROLarNde vector, in double 
transformations of electrocompetent cells of E. coli strain DH5aPRO. Control 

1 5 transformations using the unmutated XI X2 Y/pUC 1 9 vector and the ztfz-crtf/pPROLarNde 
vector were also conducted. All transformations used 25 ng of the pUC19-based vector 
and 100 ng of the pPROLarNde-based vector. Cells were recovered for one hour in SOC 
media and plated on LBAKIA media. Colonies from all of the transformations involving 
mutant plasmids were red, whereas the control double transformants were yellow. 

20 Visible spectral analysis revealed that all the mutant clones (red) produced the C40 

carotenoid lycopene while the control double transformant and A. mediolanus (yellow) 
produced the C50 carotenoid decaprenoxanthin (FIG 8). 

Hence it was concluded that none of the fragments with mutations in ORFX1, 
ORFX2 or ORFY, combined with idi-crtl fragment were capable of producing a C50 

25 carotenoid. 

The results of the three strategies combined with the results from the tests of the 
previous three constructs {idi-crtl, idi-ORFX2, and z'rfz'-ORFY) indicate a significant 
finding— that the activities of all three ORFs can be used to convert a C40 carotenoid to a 
C50 carotenoid. If the genes of all three separate ORFs were not present, the conversion 
30 of the C40 carotenoid to a O40 carotenoid was found to not occur. 
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3. The naming of the ORF genes which allow for the conversion of a C40 
carotenoid to a C50 carotenoid 

Because the ORFX1, ORFX2, and ORFY genes were all required for the 
5 conversion of the C40 lycopene (an acyclic carotenoid) to the C50 decaprenoxanthin (a 
carotenoid having two e-ionone rings), the genes have been designated as lycopene e- 
cyclase transferases, as described in the following table: 

* ORFX1 is designated lycopene e-cyclase transferase A, or IctA. 
10 * ORFX2 is designated lycopene e-cyclase transferase B, or IctB. 

* ORFY is designated lycopene e-cyclase transferase C, or IctC. 

Based on the data described herein, a biosynthetic pathway for decaprenoxanthin 
in A. mediolanus is shown in FIG 10. It is believed that the genes described herein could 
1 5 be present in other C50 producing bacteria such as Sarcinaflava, Corynebacterium 
poinsettiae, Arthrobacter sp., such as A. glacialis, Sarcina luteus {Micrococcus luteus), 
Halobacterium cutirubram and salinarium, and Cellulomonas biazotea. It is believed 
that such genes could be isolated using techniques similar to those used for the present 
invention, and accordingly, such genes are considered part of the present invention. 

20 

IV. Experimental Materials, Methods, Results, and Examples— Micrococcus luteus 

Brief outline of the subject matter described in section IV 

1. Selection of five C50 carotenoid producing bacteria as candidates for study; 
25 isolation of genomic DNA. 

2. Synthesis of A. mediolanus IctC probe from previously described colony Yl . 

3. Determination of homology between genes from each candidate bacterium and 
the IctC probe of A. mediolanus. 

4. Selection of M. luteus ATCC 383 for study in view a substantial homology 
3 0 finding of one of its genes with the IctC probe. 

5. Construction of a genomic DNA library for M. luteus ATCC 383. 
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6. Finding substantial homology between IctA, IctB, and IctC of M. luteus ATCC 
383 and IctA, IctB, and IctC of A. mediolanus. 

7. Identification of the carotenogenic operon for M. luteus ATCC 383. 

8. Sequencing and sequence analysis for the carotenogenic operon. 

5 9. Identification of six genes (crtE, crtB, crtl, IctA, IctB, and IctC) within the 

operon. 

10. C50 production in M. luteus ATCC 383 

11. BLAST analyses; Determining homology between genes. 

Details elaborating the brief outline are described in the remainder of section IV. 

10 

A. Preparation of genomic DNA for candidate bacteria; Choice of Micrococcus 
luteus (ATCC 383) 

Five bacteria (species and strains) that produce C50 carotenoids were obtained 
15 from ATCC: 

* Micrococcus luteus ATCC 147. 

* Micrococcus luteus ATCC 383. 

* CelMomonas biazotea ATCC 486. 

* Halobacterium salinarium ATCC 33 1 70. 
20 * Halobacterium salinarium NRC-1 . 

In addition, the following control was employed 

* Agromyces mediolanus ATCC 1 3930 (control). 

Genomic DNA was isolated from each line plus the A. mediolanus control, using a 
Gentra Puregene DNA Isolation Kit (Gentra, Minneapolis, MN). Genomic DNA (1 .0-1.5 
25 ug) was used in digests with the restriction enzymes Pst I and Xho I, and separated on a 
0.8% Tris-Acetate-EDTA (TAE) agarose gel. DIG-labeled molecular weight markers II 
and III (Roche Biomedical Products, Indianapolis, IN) were also included on the 
gel/membrane. DNA was transferred to a nylon membrane using a routine Southern 
transfer procedure. 

30 DIG-labeled probes (894 bp) of the A. mediolanus IctC locus were synthesized 

using a PCR DIG Probe Synthesis Kit (Roche). Half-strength and full-strength DIG 
probes were amplified using plasmid DNA of the previously described Yl clone as 
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template and the ORFYF and ORFYR primers in 50 uL PCR reactions. The 5' end of the 
ORFYF primer is located 14 bp upstream of the IctC translational start codon and the 5' 
end of the ORFYR primer is located 1 5 bp upstream of the IctC translational stop codon. 

5 ORFYF: 5'- AGAGGAGCCGAGCGATGAG -3' (SEQ ID NO: 40) 
ORFYR: 5'- CGTACCAGATCAGCAGCATC -3' (SEQ ID NO: 41) 

The PCR reactions were separated on a 1% TAE-agarose gel and the probes were 
gel purified using a QIAquick Gel Purification Kit (Qiagen, Valencia, CA). After baking, 

10 membranes were prehybridized in EasyHyb Buffer (Roche) for at least 2 hours at 42°C 
and hybridized overnight at 42°C using 400 nL of the half-strength DIG labeling reaction 
per mLof hybridization solution. Washing of the membranes and detection of 
hybridization was achieved using a Wash and Block Buffer Set (Roche). Membranes 
were washed two times for 5-10 minutes each at room temperature in 2X SSC/0.1% SDS 

15 and two times for 15-20 minutes each at 55°C in 0.1X SSC/0. 1% SDS. After rinsing with 
washing buffer, the membranes were covered with blocking buffer and placed on a shaker 
for 1 .5 hours at room temperature. The blocking buffer was replaced with fresh blocking 
buffer containing 150 mU of AP conjugate per mL of buffer and shaken at room 
temperature for an additional 30 minutes. Membranes were then washed twice for 15 

20 minutes each at room temperature with washing buffer, followed by a five minute wash 
with detection buffer. The detection buffer was replaced with fresh detection buffer 
containing 20 uL of NBT/BCIP solution per mL of buffer. This was placed in the dark at 
room temperature with no shaking until color developed, after which the buffer was 
replaced with 10 mM Tris-1 mM EDTA solution. 

25 Of the five strains tested, M. luteus ATCC 383 and M. luteus ATCC 147 showed 

fragments having the highest homology to the IctC probe. Restriction digests were done 
of genomic DNA of these two genotypes and A. mediolanus using the enzymes Xho I, 
ApaL I, and Sac I. DNA was separated on a 0.8% TAE-agarose gel, transferred to nylon 
membrane, and hybridized with the IctC probe as described above with the following 

30 exceptions. DIG-labeled Marker VII was included on gels/membranes. The DIG-labeled 
probe, which had been stored at -20°C, was heated at 65°C for 15 minutes before reuse. 
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After two washes in 2X SSC/0.1% SDS, membranes were washed twice at 64°C in 0.5X 
SSC/0.1%SDS. 

Whereas M. luteus ATCC 147 exhibited multiple bands of hybridization, M. 
luteus ATCC 383 showed a single dominant band for most of the digests. The Sac I 
5 digest for M. luteus exhibited a relatively strong band of approximately 4 Kb. Multiple 
Sac I digests were done for this genotype and separated on a 0.8% TAE-agarose gel. 
DNA fragments approximately 3.5-4.5 Kb in size were excised and gel purified using a 
QIAquick Gel Purification Kit. 

In view of the above findings, M. luteus ATCC 383 was chosen for further study. 

10 

B. Library construction for M. luteus 383; Identification of the carotenogenic 
operon 

The pUCl 8 vector (2.5 ug) was digested for 3 hours using Sac I restriction 

15 enzyme to generate fragment ends compatible with the digested genomic DNA from M. 
luteus ATCC 383. The Sac I-digested pUCl 8 was dephosphorylated using shrimp 
alkaline phosphatase (SAP, Roche Diagnostics GmbH) and subsequently purified using 
gel electrophoresis on a 0.8% TAE-agarose gel and a QIAquick Gel Purification kit as per 
the manufacturer's instructions. 

20 Purified insert DNA (60 ng) was ligated with 40-1 40 ng of prepared vector using 

T4 DNA ligase at 16°C for 16 hours. A portion of the ligation reaction (1.2 uL) was 
electroporated into 40 uL of E. coli Blectromax™ DH1 0B™ cells using standard 
electroporation protocols. Transformations were plated on LB media containing 40 
ug/mL of X-gal and 100 p.g/mL of carbenicillin (LBCX). Once an appropriate plating 

25 volume was determined, multiple transformations were conducted using remaining 
portions of the ligation reaction and were plated to achieve individual colonies. 

Individual, white colonies were patched in a 6 x 7 grid to 14 plates of LB with 100 
Ug/mL of carbenicillin (LBC). Upon growth, colonies were replica plated to new LBC 
media. Colony lifts were made, according to standard procedures, using one of the sets of 

30 plates. Plasmid DNA of the A. mediolanus Yl colony (5 ng) was spotted to some of the 
membranes as a hybridization control. After baking, each membrane was treated with 
600 uL of 1 .67 mg/mL Proteinase K (Qiagen) diluted in 2X SSC and heated at 37°C for 
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1 .25 hours. Membranes were then rinsed in 2X SSC on a shaker for one hour at room 
temperature. Prehybridization, hybridization with the IctC probe, membrane washing, 
and detection of hybridization were conducted as previously described. 

Twelve colonies were identified that hybridized above the background level. 
5 Plasmid DNA was isolated from cultures of these colonies and digested with the 

restriction enzyme Sac 1 to check insert size. Six colonies exhibited a single insert and 
six showed multiple inserts. Four colonies with unique restriction patterns were 
sequenced using M13R and M13F universal sequencing primers homologous to the 
pUC19 vector. The M13F sequence of Clone 1, which had a single insert of 

10 approximately 3.9 Kb, showed homology to known phytoene desaturases. The remainder 
of this clone was sequenced by primer walking. 

Homologies found for genes of interest are described in more detail in the BLAST 
Analyses section below. The three ORFs that showed homology to the IctA, IctB, and 
IctC genes of mediolanus were called IctA, IctB, and IctC genes of M. luteus ATCC 383. 

15 Genome walking was conducted to obtain the sequence of the C50-carotenoid 

operon upstream of the phytoene desaturase fragment. Genome walk libraries were made 
according to the protocol described for CLONTech's Universal Genome Walking Kit 
(CLONTech Laboratories, Inc., Palo Alto, CA). The restriction enzymes Hinc II, Stu I 
and Pvu II were used in making these libraries. The following primers were used in the 

20 procedure: 

GSPIF: 5'- TTCATGGACGTGCCCAGCAGCGTTGCCA -3' (SEQ ID NO: 42) 
GSP2F: 5'- AGGTGGGCGAAGTCCGTGTAGAGGAAG -3' (SEQ ID NO: 43) 

25 GSPIF and GSP2F are primers facing upstream and GSP2F is nested inside of 

GSPIF. The addition of 5% DMSO to the PCR mixture was found to be necessary for 
amplification. First round PCR was conducted in a Perkin Elmer 9700 Thermocycler 
with 7 cycles consisting of 2 sec at 94°C and 3 min at 72°C and 34 cycles consisting of 2 
sec at 94°C, and 3 min at 66°C, with a final extension at 66°C for 4 min. Second round 

30 PCR used 5 cycles consisting of 2 sec at 94°C and 3 min at 72°C and 24 cycles consisting 
of 2 sec at 94°C and 3 min at 66°C, with a final extension at 66°C for 4 min. Nine uL of 
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the first round product and seven uL of the second round product were run on a 1.5% 
TAE-agarose gel. A 0.9 Kb band was obtained for the second round product for the Bine 
II library. This fragment was gel purified using a QIAquick Gel Purification Kit. Four 
uL of the purified DNA was ligated into pCR®II-TOPO vector and transformed by a 
5 heat-shock method into TOP10 E. coli cells using a TOPO cloning procedure (Invitrogen, 
Carlsbad, CA). Transformations were plated on LB media containing 100 (ig/mL of 
ampicillin and 50 ug/mL of X-gal. 

Individual, white colonies were screened by PCR using the GSP2F and AP2 
primers. Individual colonies were resuspended in approximately 27 ul of 10 mM Tris and 

10 2 uL of the resuspension was plated on LBK media (50 ug/mL kanamycin). The remnant 
resuspension was heated for 10 minutes at 95°C to lyse the bacterial cells, and 2 \xL of the 
heated cells used in a 25 jiL PCR reaction. The PCR mix contained the following: IX 
Taq buffer, 0.2 uM each primer, 0.2 mM each dNTP, 5% DMSO (v/v), and 1 unit of Taq 
polymerase per reaction. The PCR reaction was performed in a Perkin Elmer 9700 

1 5 Thermocycler using the same program as used in the second round of genome walking. 
PCR product was separated on a 1 % TAE-agarose gel along with remnant second round 
Hinc II product. Plasmid DNA for two colonies having inserts of the desired size was 
sequenced with the AP2 and GSP2F primers. The sequence obtained showed homology 
to known phytoene desaturases. 

20 A second round of genome walking was conducted to obtain the remainder of the 

C50-carotenoid producing operon. The following primers were designed from the 
forward end of the sequence obtained from the first round of genome walking: 

GSP1F2: 5>- AAGTAGGTGCGTCCGAGCTGGTCGTGGT -3' (SEQ ID NO: 44) 
25 GSP2F2: 5'- GTCCGCGCCGAGATCCCGCAGGAAGTT -3' (SEQ ID NO: 45) 

GSP1F2 and GSP2F2 are primers facing upstream and GSP2F2 is nested inside of 
GSP1F2. 

These primers were used in PCR as described above and in the Genome Walker 
30 manual. A band of approximately 2.6 Kb was obtained for the second round PCR 

reaction using the Pvu II library. This DNA was gel purified, ligated into pCR®II-TOPO 
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vector, and transformed into TOP 10 E. coli cells using a TOPO cloning procedure. 
Individual colonies were screened by PCR for insert size, as previously described, using 
the AP2 and GSP2F2 primers. Plasmid DNA was obtained for a colony exhibiting an 
insert of the desired size and was sequenced using the GSP2F2 and AP2 primers. The 
5 remaining sequence for the insert was obtained by primer walking. PCR products for 
several regions of the operon were also sequenced to confirm the DNA sequence. 

The full sequence of the operon, obtained by colony hybridization and genome 
walking, is given in FIG 12. 

As seen in FIG 12, the operon isolated from M. Iuteus ATCC 383 comprises the 
1 0 following genes in order of location in the operon: 

* crtE, geranylgeranyl pyrophosphate synthase. 

* crtB, phytoene synthase. 

* crtl, phytoene dehydrogenase (phytoene desaturase). 

* IctA of M. Iuteus ATCC 383— having homology with IctA of A. 
15 mediolanus. 

* IctB of M. Iuteus ATCC 3 83— having homology with IctB of A 
mediolanus. 

* IctC of M. Iuteus ATCC 383— having homology with IctC of A. 
mediolanus. 

20 

C. Confirmation of C50 production in M. Iuteus ATCC 383 

C50 carotenoid (decaprenoxanthin) was produced in E. coli when the crtE-lctC 
gene fragment from M. Iuteus was cloned into E.coli together with the idi gene from 
25 E. coli on a pUCl 9 plasmid. 

A gene construct containing the crtE, crtB, Crtl, IctA, IctB and IctC genes were 
inserted into the expression vector pProLarNde as described above. The idi gene from E. 
coli was cloned into the vector pUC19. These two plasmids were co-transformed into 
E.coli DH10B electrocompenet cells. Approximately 60 ng of the idi+pUC19 construct 
30 and 240 ng of crfE-lctC+pPRONde construct were used to electroporate 40 uL of 

ElectroMAX DH10BTM competent cells. Electroporated cells were recovered in SOC 
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media for one hour and plated on LB plates containing 50 ug/ml of kanamycin, and 50 
ug/ml of carbenicillin. Colonies were obtained after incubation at 37°C and plated on LB 
plates containing 50 u.g/ml of kanamycin, and 50 ug/ml of carbenicillin, 1 mM IPTG, and 
2% L-arabinose (LBKCIA) to induce gene expression from both vectors. After incubation 
5 colonies were scraped off the plate and extracted by the DMSO method of An et al. Cells 
were washed once with distilled water and once with acetone. The pellets were dried in 
air and resuspended in one ml of DMSO preheated to 55°C. Glass beads were added to 
each tube and vortexed to resuspend the pellets. One ml of acetone was added to extract 
the carotenoid , and one ml of hexane and two mis of 20 % sodium chloride solution were 

1 0 added and the tubes vortexed. The phases were separated by centrifugation and the 

hexane phase was removed for carotenoid analysis. Spectrophotometric analysis between 
350 and 500 nm revealed that the carotenoid profile matched that expected for 
decaprenoxanthin. These hexane carotenoid extracts were also subjected to mass 
spectrometer analysis and the expected Mass ion of 705.3 was observed in the E.coli 

1 5 double transformant as well as two additional mass ions at 687.4 and 669.6 corresponding 
the loss of one and two water molecules respectively. This mass of 705 (M+H) matches 
that expected for decaprenoxanthin. 
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D. BLAST analyses to determine homology between genes 

BLAST searches of the above DNA sequence for M. luteus ATCC 383 against the 
Swisspro database identified the probable translational start and stop codons for the genes 
in the C50-carotenoid operon. The geranylgeranyl pyrophosphate (GGPP) synthase gene 
5 (crtE) for M. luteus ATCC 3 83 showed highest homology to the GGPP synthase gene of 
Brevibacterium linens (33% identity). The M. luteus ATCC 383 phytoene synthase gene 
(crtB) had highest homology to the phytoene synthase gene of Corynebacterium 
glutamicum (31% identity), followed by that of Brevibacterium linens. The phytoene 
desaturase gene (crtl) of M. luteus ATCC 383 showed highest homology to phytoene 

10 desaturase/dehydrogenase genes in Brevibacterium linens, Corynebacterium glutamicum, 
Halobacterium salinarium NRC-1, and Methanobacter thermautotrophicus, in order of 
decreasing homology. 

The only significant BLAST hit for the M. luteus ATCC 383 IctA and IctB genes 
were to epsilon cyclase genes in Corynebacterium glutamicum (crtYe and crtYf, 

15 respectively, of Krubasik et al., Eur. J. Biochem. 268: 3702-3708 (2001)). The IctC gene 
of M. luteus ATCC 383 showed homology to lycopene elongase (crtEb of Krubasik et al.) 
from Corynebacterium glutamicum, followed by ORFs in Deinococcus radiodurans and 
Halobacterium salinarium NRC-1. 

20 Alignments of genes from M. luteus, A. mediolanus, and C. glutamicum) 

Alignments for the crtE (GGPP synthesis genes), crtB (phytoene synthase genes), 

crtl (phytoene desaturase gene), IctA, crtYe, IctB, crtYf, IctC, and crtEb genes from M. 

luteus (Ml), A. mediolanus (Am), and C. glutamicum (Cg) were aligned. Alignments 

were done using Align Plus software (Scientific and Educational Software, Durham, NC). 
25 These alignments were done using the multiway protein alignment function in 

conjunction with the BLOSUM 62 matrix. 

Results indicate that there is significant sequence identity shared between the 

amino acid sequences. These results indicate that the sequences could be used as 

substitutes for each other when they are used to create biosynthetic routes for generating 
30 C40, C45, and/or C50 carotenoids. Tables 3-8 provide a summary of the results from the 

alignments. 
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Table 3 



Gene 


Start 


End 


Length 


Matches 


% Sequence Identity 


Ul-crtE 


1 


366 


366 aa 


188 


49% (Ml -crtE and Am-crtE) 


Am-crtE 


1 


369 


369 aa 


207 


54% (Am-crtE and Cg-crtE) 


Cg-crtE 


1 


382 


382 aa 


158 


40% {Cg-crtE and Ml-crtE) 



Table 4 



Gene 


Start 


End 


Length 


Matches 


% Sequence Identity 


Mi-crtB 


1 


331 


331 aa 


190 


56% (Ml-crtB and Am-crtB) 


Am-crtB 


1 


303 


303 aa 


178 


56% (Am-crtB and Cg-crtB) 


Cg-crtB 


1 


304 


304 aa 


304 


47% (Cg-crtB and Ml-crtB) 



Table 5 



Gene 


Start 


End 


Length 


Matches 


% Sequence Identity 


Mi-crtI 


1 


543 


543 aa 


337 


59% (Ml-crtI and Am-crtI) 


Am-crtI 


1 


544 


544 aa 


364 


65% (Am-crtI and Cg-crtI) 


Cg-crtI 


1 


549 


549 aa 


308 


54% (Cg-crtI and Ml-crtI) 



Table 6 



Gene 


Start 


End 


Length 


Matches 


% Sequence Identity 


Ui-lctA 


1 


115 


115 aa 


62 


52% (Ml-lctA and Am-lctA) 


Am-lctA 


1 


123 


123 aa 


67 


45% (Am-lctA and Cg-crtYe) 


Cg-crtYe 


1 


132 


132 aa 


62 


48% (Cg-crtYe and MI-Zc^) 



Table 7 



10 



Gene 


Start 


End 


Length 


Matches 


% Sequence Identity 


Mi-lctB 


1 


164 


164 aa 


69 


44% (Ml-lctB and Am-lctB) 


Am-lctB 


1 


115 


115aa 


66 


36% (Am-lctB and Cg-crtYf) 


Cg-crtYf 


1 


130 


130 aa 


53 


42% (Cg-crtYf and Ml-lctB) 
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Table 8 



Gene 


Start 


End 


Length 


Matches 


% Sequence Identity 


Mi-toC 


1 


291 


291 aa 


206 


66% (Ml-lctC and Am-lctQ 


Am-lctC 


1 


298 


298 aa 


199 


57% (Am-lctC and Cg-crtEb) 


Cg-crtEb 


1 


287 


287 aa 


166 


70% (Cg-crtEb and M-lctQ 



V. Conclusions 

The experiments described above allowed for the isolation of the following seven 
5 (7) genes involved in the biosynthesis of the C50 carotenoid decaprenoxanthin in A. 
mediolanus: 

* isopentenyl pyrophosphate (diphosphate) isomerase (idi), 

* geranylgeranyl pyrophosphate synthase (crtE), 

* phytoene synthase (crtE), 
10 * phytoene desaturase (crtl), 

* lycopene e-cyclase transferase A (IctA), 

* lycopene e-cyclase transferase B (IctB), and 

* lycopene s-cyclase transferase C (IctC). 

Similar genes with substantial homology to the A. mediolanus genes were then 
15 isolated from M. luteus. It is believed that other similar genes with substantial homology 
could be isolated using similar techniques, and that such genes fall within the present 
invention. 

The experiments also show that there is a conservation in the gene arrangement 
between ORFs XI, X2 and Y, or let A, B and C genes respectively. A schematic 

20 comparison of the let A, B and C genes from A. mediolanus and M. luteus with certain 
genes from other bacteria is shown in FIG 9. 

A schematic biosynthetic pathway, which is believed to summarize reactions of 
the present invention, is shown in FIG 1 0. As has been shown, the let genes code for 
enzymes that react with the C40 carotenoid lycopene to perform two successive e- 

25 cyclizations — coupled to the addition of C5 residues at the 2 and 2' positions of the 
resulting carotenoid — to form (successively) a C45 (dehydrogenans-P452) and a C50 
(decaprenoxanthin) carotenoid. 
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The invention provides genes capable of converting a C40 carotenoid to a C50 
carotenoid. These genes (IctA, IctB, and IctC ) are the first example of a set of genes that 
covert at C40 carotenoid to a C50 carotenoid in a single step. The three separate proteins 
can be used to convert a C40 carotenoid to the C50 carotenoid in a single step. 
5 Some alternate uses of the genes described in this report are listed below. Some 

or all of the identified genes involved in lycopene biosynthesis ( crtE, crtB, crtl) could be 
used alone, or in combination with carotenogenic genes from other organisms, in order to 
produce carotenoids such as (but not limited to): lycopene, B-carotene, lutein, zeaxanthin, 
canmaxanthin or astaxanthin. The gene for isopentenyl pyrophosphate isomerase (idi) 

1 0 could be utilized to increase the concentration of any carotenoids produced by a 

microorganism. This idi gene could be used in a genetic background that includes none, 
some or all of the other A. mediolanus carotenoid biosynthetic genes described here. A 
gene for carotenoid glycosyl transferase (e.g., zeaxanthin glycosyl transferase (crtX)) in a 
genetic background capable of producing dehydrogenans P-452, may be used to produce 

1 5 dehydrogenans P-452 monoglucoside; or (in a decaprenoxanthin producing background) 
to produce coiynexanthin (decaprenoxanthin monoglucoside) or corynexanthin 
monoglucoside. Use of a carotenoid desaturase gene that is capable of adding additional 
conjugated double bonds to the C50 substrate will increase the antioxidant capacity of the 
molecule and change the spectral properties of the molecule (i.e. increasing the max of the 

20 carotenoid). As mentioned before, sequence similarity searches of the Genbank public 
databases show three genes which have certain levels of homology to IctC. These genes 
are from carotenogenic organisms (Deinococcus radiodurans, Halobacterium sp. NRC-1, 
and Methanobacterium thermoautotrophicum) but their functions had not been previously 
defined. Because of the level of similarity between the gene sequences, it is probable that 

25 these three genes define a family of genes, all of which are involved in the conversion of 
C40 carotenoids to O40 carotenoids. The let genes may be manipulated to perform 
other, related functions. These may include (but are not limited to): addition of the C5 
residue without the associated cyclization reaction and/or addition of the C5 residue with 
a P-cyclization reaction (as opposed to the current e-cyclization). 

30 It is not difficult — through the use of additional enzymes like the FGPP synthase, 

combined with the genes isolated from A. mediolanus — to generate a fully conjugated 
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novel C50 carotenoid with greatly improved antioxidant potential as well as unique 
absorption maxima. Such a molecule would result in carotenoids with novel colors. 
Similarly, modified phytoene desaturases — created by shuffling or by using other 
mutagenic techniques — could be employed with concepts of the present invention to 
5 create additional high performance carotenoids. 

OTHER EMBODIMENTS 

It is to be understood that while the invention has been described in conjunction 
with the detailed description thereof, the foregoing description is intended to illustrate and 
10 not limit the scope of the invention, which is defined by the scope of the appended claims. 
Other aspects, advantages, and modifications are within the scope of the following 
claims. 
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WHAT IS CLAIMED IS: 

1. An isolated polypeptide comprising at least one amino acid sequence selected 
from the group consisting of: 

5 (a) the amino acid sequence set forth in SEQ ID NOS: 04, 05, 06, 10, 1 1, 12, 17, 

18, 19, 20,24, 25 or 26; 

(b) an amino acid sequence having at least 10 contiguous amino acid residues of 
the amino acid sequence set forth in SEQ ID NOS: 04, 05, 06, 10, 1 1, 12, 17, 18, 19, 20, 
24, 25 or 26; 

1 0 (c) an amino acid sequence having one or more conservative amino acid 

substitutions within the amino acid sequence set forth in SEQ ID NOS: 04, 05, 06, 10, 1 1, 
12, 17, 18, 19, 20, 24, 25 or 26; and 

(d) an amino acid sequence having at least 65% sequence identity with the amino 
acid sequences of (a) or (b). 

15 

2. An isolated nucleic acid molecule encoding said polypeptide of claim 1 . 

3. The nucleic acid molecule of claim 2, wherein said polypeptide is capable of 
converting a C40 carotenoid to a C50 carotenoid. 

20 

4. The nucleic acid molecule of claim 2, wherein said polypeptide is capable of 
converting a C40 carotenoid to a C45 carotenoid. 

5. The nucleic acid molecule of claim 2, wherein said polypeptide is capable of 
25 converting a C45 carotenoid to a C50 carotenoid. 

6. The polypeptide of claim 1, wherein said polypeptide is capable of 
synthesizing a C40 carotenoid. 

30 7. A production cell comprising said nucleic acid molecule of claim 2. 
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8. An isolated nucleic acid molecule comprising a nucleic acid sequence selected 
from the group consisting of: 

(a) the nucleotide sequence set forth in SEQ ID NOS: 01, 02, 03, 07, 08, 09, 13, 
14, 15, 16, 21, 22 or 23; 
5 (b) a nucleic acid sequence having at least 1 0 contiguous nucleotides of the 

nucleotide sequence set forth in SEQ K) NOS: 01, 02, 03, 07, 08, 09, 13, 14, 15, 16, 21, 
22 or 23; 

(c) a nucleic acid sequence that hybridizes under moderately stringent conditions 
to the nucleotide sequence of (a); and 
10 (d) a nucleic acid sequence having 65% sequence identity with the nucleic acid 

sequence of (a) or (b). 

9. A production cell comprising said nucleic acid molecule of claim 8. 

15 1 0. A method for making a C50 carotenoid, said method comprising contacting at 

least one of said polypeptides of claim 1 with a C40 carotenoid such that said C50 
carotenoid is made. 

1 1 . A method for making a C50 carotenoid, said method comprising culturing 
20 said production cell of claim 7 under conditions wherein said C50 carotenoid is made. 

12. A method for making a C45 carotenoid, said method comprising contacting at 
least one said polypeptide of claim 1 with a C40 carotenoid such that said C45 carotenoid 
is made. 

25 

13. A method for making a C45 carotenoid, said method comprising culturing the 
production cell of claim 7 under conditions wherein said C45 carotenoid is made. 

14. A method for making a polypeptide, said method comprising culturing said 
30 production cell of claim 7 under conditions such that said polypeptide is made. 
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15. A specific binding agent that binds to said polypeptide of claim 1. 

16. A method for making a O40 carotenoid, said method comprising culturing a 
production cell, wherein said production cell comprises an exogenous nucleic acid 
molecule, wherein said exogenous nucleic acid molecule encodes a polypeptide that 
elongates a O40 carotenoid by at least one carbon atom, wherein the product produced 
by said polypeptide is a carotenoid having a carbon backbone of >40 carbon atoms. 

17. The method of claim 16, wherein said exogenous nucleic acid molecule 
comprises a nucleic acid sequence selected from the group consisting of: 

(a) the nucleotide sequence set forth in SEQ ID NOS: 01, 02, 03, 07, 08, 09, 13, 
14, 15, 16, 21, 22 or 23; 

(b) a nucleotide sequence having at least 10 consecutive nucleotides of the 
nucleotide sequence set forth in SEQ ID NOS: 01, 02, 03, 07, 08, 09, 13, 14, 15, 16, 21, 
22 or 23; 

(c) a nucleic acid sequence that hybridizes under moderately stringent conditions 
to the nucleotide sequence of (a); and 

(d) a nucleic acid sequence having 65% sequence identity with the nucleic acid 
sequence of (a) or (b). 

18. The method of claim 16, wherein said exogenous nucleic acid molecule 
encodes a polypeptide, said polypeptide comprising at least one amino acid sequence 
selected from the group consisting of: 

(a) the amino acid sequence of SEQ ID NOS: 04, 05, 06, 10, 11, 12, 17, 18, 19, 
20, 24, 25 or 26; 

(b) an amino acid sequence having at least 10 contiguous amino acid residues of 
the amino acid sequence set forth in SEQ ID NOS: 04, 05, 06, 10, 11, 12, 17, 18, 19, 20, 
24, 25 or 26; 

(c) an amino acid sequence having one or more conservative amino acid 
substitutions within the amino acid sequence of SEQ ID NOS: 04, 05, 06, 10, 11, 12, 17, 
18,19, 20, 24, 25 or 26; and 
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(d) an amino acid sequence having at least 65% sequence identity with the amino 
acid sequences of (a) or (b). 
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Agromyces meaiolanus carotenogenic operon 

Proposed translafcional start codons (bolded) 

idi 1888 

crtE 2505 

crtB 3611 

crtl 4584 

ORPX1 6215 

ORPX2 6583 

ORFY 6927 
Proposed translational stop codons (under lined) 

idi 2508 ' 

ortE 3614 

crtB 4587 

crtl 6218 

0RFX1 6586 

ORFX2 6930 

ORFY 7823 



1 ggatcacggg eagctcgacg ccgcgccggg cgagctcggc ctcgagtgeg gcctbcagce 

61 cgcggttctg ccggttgatc gggctgatgc cgccgaagtg gcggcagtgg tgggegaccfc 

121 cttcgaggcg ctcgccgggg atgccccggc cgctcgegac gttccgcagg aaggggatga 

181 cgtcgtcctg cccctcgggc ccgccgaagc cggccagcag gaccgcgtcg taggcgacgg' 

241 gctcggtgac gtgctcgggg cccgactggg cggcctcggc ggcaccgggc acgcaggcgc 

301 ccgaggcgca gtacgcctcg gcggcggggg ccggcttgcg gccgcgggcg gcctcgcgct 

361 cgggcgcggc ggctccggte gageccaggt tcgtcgcgga cattactgga gcacccccac 

421 gagctcggcg gtcgagatcc gtcgaeeggt gtagaacggg acctcttcgc gcacgcgcat 

481 gagggcgtcg gtggcgcgca gctcgcgcat gaggtcgacg agctcggtga gctcgfccgga 

541 ctcgatgggc agcagccacc cgcagtcgcc gagggcgaag gcgcccacgg tgtgggcgat 

601 cgcgccgcgg aaggtggcgc ccccgcggcc gtggtcggcg agcatgcggg agcgetcggc 

661 cgggtcgagc aggtaeoagt cgtagccgcg cacgaagggg tagaccgtca gccagccetb 

721 gggctcgatg ccgcgcagga agcecggcac gtgcgcctcg ttgaactcgg cgtcgcggtg 

781 cacgeecatg gcgttccacg tcggcagcag cgcgcgcagc aggcggctge gcttcagccc 

841 gcgcagcgcc cactgcaggc cctcggcggc ggagccgtgc agccagatca tgacgteggc 

901 gtcggcgcgg aggccggaga cgtcgtagag cccgcgcace gtgacgccct cgttctcgac 

961 gagcgcgatg acgccgtcga gtteggteac gaagcgcggc acategcgec cgfccgaggcc 

1021 atcggggcgc gcggggcccc tccggagcae ggcgaagagc gtgtagccct cgggcgacfcg 

1081 ctcgggctcg gacgegtgac ggagctcgtc ggatgcccct tcggcagcgg gggaagacat 

1141 acccccagtc tccctotfcfcc ecccggaagg Cccaaaaggg aggcgtcggc tccgccgaat 

1201 ggcgegggaa tccgcggacg gctcagtect gfcccggtcgc ggcgagegeg tcgaggaagc 

1261 ggatgacgac etcgcgctcc tccggcgoga gggccgccgc ggcggogaag cgtcgcgcgt 

1321 gctgacggcc gacggteteg cgggcgtcgt gccgggtgtg ctcggtgacg gcgaccgega 

1381 gggcgcggcg gtcgctcggg tggggcgatc gggtgacgtg gccgecgcgt tcgagccggt 

1441 cgaggagett cgtggtggag gcgctggaga tgccgaggtg ctcggcgagc gcgccgggcg 

1501 tcacgacgag cccefcggttg cgtgcggcga tgaggaagcg gafcggcgcgc atgtoggtct 

1561 ggttgagccg catgtagcgc cgggatgcct cgctcatgcg ctcggcggcg gcgtgccagc 

1621 cgcgcagcgc ccgcatgacg cgcaegacct ggtcgacctc cccgtcggcg agtccgccgc 

1681 ggtcgacgag ctcctcgtcg cggtcgaega tgcgcggatc gcgcatcgec gafctccacgc 

1741 gccggctgcg ctcccgccca cctgccatgt cgagattcta gccaagcgag acgaatcfccg 

1801 ctaagctact cactagecag gcgagatatt cgccgcagcg agggttcgga tcgagcacet 

1861 cgcgccggag ttgtcgaagg agccgacatg accgacccca gcabcaegcc gctgccggcc 

1921 caggccgcac cggtgcagco cgcatccagc gccgaatcgg tcgtgctgct cgacgaggcc 

1981 ggcaaccaga tcggcaccgc cccgaagtcg agcgtgcacg gcgaogaaac cgccctccat 

2041 cccgcgctct cotgccacgt cctcgacgac gacgnccgcc tcctggtgac ccgtcgogog 

2101 ctcggeaagg tcgcctggcc cggcgtgtgg accaaccccc cctgcgggda ccccgccccg 

2161 gccgagccgc tgccgcacgc ggcgcgcego cgggccgagt ccgagatagg cetcgagctc 

2221 cgcgacgtcg agccggtgct gccgctctto cgctaccggg cgacggatgc cfcogggcatc 

2231 gccgagcacg agatcbgccc ggtccacacg gcgcgcacaa gcccggtgcc ggc'gccgcat 

2341 ccegaegagg tcctcgacct egectgggfcc gaaccgggag agctcgccsc cgcggcccgc 

2401 gccgegcect gggcgteeag tccctggctc gtgcbgcagg cgcagctgc-t gccctccccc 

2461 ggcggccacg ccgacgegcg cgtccgcacg gaagagetcg tctcgtcjagc ctcgtcgcga 

2521 ccgtggtcgc cccgagccgg caggcggagg tggagcgcta cctcggcgge ttottcgaeg 

2581 acgccaccgt gcgggccgae gegcangccg ccgactaocg gcggctcfcgg gcggcggcga 

2641 gggacgccgc gagcggcggc aagcggatcc gccccaggcc cgfcgetggge gcctacgacg 
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2701 cgcbcgccgc gcagggcgcg ccggcgagcg 

2761 ccgccgcgga ggccgtggcg cccgcggcgg 

2821 Cgcacgacga cgfccatcgac cgcgaccccg 

2881 gcttcgcgct cgacgccgcg ctgcgcgggc 

2941 aggcctcggc gatccbcgcg ggcgacctgc 

3001 ccccgacgtg ccggtcgagc gccggcgagc 

3061 tcgccgccgc cgcgggcgag cacgccgacg 

3121 aggcggacat ccccgcgatg atcgaggaca 

3181 tccgggcggg cgcgctgctc gccggcgccc 

3341 ccggccgtcg actcggcgtc gccttccagc 

3301 acgagcgggt gaceggcaag acggcgctcg 

33S1 tcatcgccta cgcgcggggg cacgcggccc 

3421 ecgaccccga cgaggcgggc gcccgccccc 

3481 gcgcccgegt cgaggcgcgc atcgccgagg 

3541 cggcgggccb gcecgccgcg ctcgaagccg 

3601 ggaggtcgag gegaccgcgc taccgatcgg 

3 661 caceaggagg accaggtgag cacgcgcacc 

3721 tccaccggcc fccgccctcta cgaccgcaec 

3781 gcgcactcga cctccttcgg cctcgcgagc 

3B41 ctcgccgagg tctacgcget cgtgcgcacc 

3 901 gaggccgggc tgccgtgcga gcgccgccgc 
3961 gaggccgcct tcgagagcgg ocacagcgcc 
4021 gcgcggogca gcggcttcgg ccaggagctc 

4 081 gaccccgagc ccabcgccfct caccgaggag 
4141 gccgaggCcg tcggcctgat gbgcctgcgc 
4201 gagegcgacg cccgctggga gcgcggcgcg 
4261 aacttcctgc gggacctcgg ggaggatgcc 
4321 gtcgatccgg tgagcttctc ggaggccaag 
4381 gagctcgacg aggcggccgc cgtgatcccg 
4441 gccgcggcgc acggcctgtt oggegagcbc 
4501 gagctcgfcca cccggcgggc ccgggtgcee 
4SS1 gtggtcgccc ' gcggaggccg gccgtgagcc 
4S21 ggctcgccac ggcggcgctg ctcgcccgcg 
46B1 gcgacgagcc cggcggccgt gccgggcgct 
4741 gtccgagctg gtacctcafcg ccagaggtgt 
4801 cggcggcega ggagctcgag ctcgbgcgcc 
4861 gctacgacga gccggtcgac gcgcgggccg 
4921 cgaccgagcc gggcgcgggc gccgcgctcg 
4981 accggctcgc gatgacgcae ttcctctaca 
5041 acgcgcoggt Ccggcggcgg cccggccggc 
5101 gcatggtggg gcgebccttc gacgacgtga 
5161 tcttcctcgg cacctagcce gagcgggcgc 
5221 acctcgccga cggggcgttc cacccgabgg 
5281 cccggccggc ccggcgggcc ggggccgagc 
S341 agacggccgg cgggcgcgcc acgggcgtgc 
S401 ecggcaccga ggagttcctg gaggcegagc 
5461 cggatgccga gctgctcccg ccccgcgcgc 
5521 gcgaccccgg acccggcacg gtgctcgcca 
5581 tcgcccacca cacgetccgc ttcacggccg 
5641 gctcgcgacc ggcgabcocc gaeccggcgt 
5701 atccgggcgt ggcgcccccc ggctgcgaga 
5761 accccacaafc cggcgccggc ggbgtcgacg 
5821 ccgaccgggc gatcgcgacc ctcgocgagt 
5881 ccctcgtgcg ccgcacgatc gggcccgcgg 
5941 gctcggcgct cggcccgggg cacaccctgc 
6D01 cabcggegaa cgtcgagggg ctgtacttcg 
6061 tgccgatgtg cctgaccagc gccgagctcg 
6121 cgggcccgcc cccggagccg agcgaggagc 
6181 tgcacccaga ccggctcgac cgggagcgca 
6241 gctcgcctcg atcgcgtgca tcgcgctcgt 
6301 ggcgccgccg cgggcgacgg tcgtggtcgc 
G361 cctefcggggc acctcgctcg geafccbtccfc 
6421 gctcattgcg ccgcacctgc cgabcgagga 
6481 cgcgatggbc ggctacacgg gaccgctgcg 
6541 caccggcccc gctgccgaat ccaccgccga 
SS01 tgcctcccgt tectcgccgt ctcggcggtg 
6661 gccggtcacg cggccgcgcc cgcgcbcacg 



gccgcgaacg ggccgacgcc gagccggccg 
ccttcgagcc gctgcacacc gcgcccctcg 
tgcgccgggg cgagcccaac gtcgccggcc 
tcgagcggga gcgggcggae gcctacggcc 
tgatcgcggc ggcgcactcc gtggcggccg 
caccctcgcc gtcctcgacg aagtgcgtct 
Cccggcacgc cgccggggtg cggcccgggg 
agacggcctg otaebegtfcc agcgcgccgc 
cgcgcgcgac ggtcgaacgg ctcggcgaga 
tgeaggaega cgtgctcggc gtctacggcg 
gggacctccg cgagggcaag gagacgctgc 
gggtcgcggc atccggcgcc ttcggccggc 
tccgcgcggc gatcgaggcg agcggcgecc 
aggcggccgc ggcgcgcacg gcgaccgccg 
agttgctcgg cctcgccgcc gaagccacca 
cgctgcgttg ctcggcctcg ccgccgaagc 
acccagcgca cgacegcgcc gcccgcaccg 
gccgccgagg gctcggcccg ggfccacccgg 
cggctctgct cccccgccgt ccgcgagqac 
gccgacgagc tcgtcgacgg cccggccgag 
gagctgctcg acgccctcga ggccgacacg 
aacctcgcgg tgcaogccfct cgcgcgcgcg 
aeccggcccc tcctcgccfce gafcgcgacgc 
cgcgagcbcg acgaacacgc ctacggctcg 
ggcttcgcga tcgggcccgc ccccgacgcc 
cgggcgctgg gctcggcgtt ccagcgggtc 
tcgccccgcg gacgccgcta ctccccgggc 
caaetgcgcc tccccgacgg eatcgacgcg 
gagchgccce gcggctgccg cgtcgcggtc 
tccgcccggc fcccgecgcgc gcccgcggcc 
gcgccgcgca agctcgccat cgtcacccgc 
gcgcggtcgt catcggcggc ggcatcgccg 
acgggcacga ggtgcggctc ttcgaggcgc 
ggcgggcgaa eggcttcccg ttcgacaccg 
tcgagcactt cbaccgctfeg acgggc'acca 
tcgaccccgg ctaccgggbg bacttcgagg 
agcgcgaggc acccatcgce cfcctbcgagc 
cccggcacct cgactccgcc aacgagacgt 
ccgacttcgc ecacccgggg gcgctgctcg 
tcgcgaagct gctgctcgaa cegctcgacc 
ggctgcggca gatcctgggc tacocggcgg 
cgagcatgba ccacctgatg agccgcctcg 
gcggctCcgg cgagabcabc gcgagcgbgg 
bcgtcaccgg cgcgcgggtg cteggeatcg 
gcgcgeagca ecacggcccg accggtggca 
tcgccgtctc cgccgccgat ctgcafccada 
ggacgcggag cgaggcabcc tggtcgcgc'c 
tgctcggcgt gcacgggcgg ctgccggagc 
actggcgcac gaacccccag cgggbgfcbcg 
cgccccacgb ctgccgcccg agtgcgacgg 
acctgctcct gcbcgbgccg gtgcccgccg 
gccgcggcga ccgggcggtc gaggagacgg 
gggccggcat ccccgacctc gccgagcgga 
acbbcgagga ctggctccag tcctggcgcg 
ggcagagogc catgttccgg gggcgcacgg 
cgggggcgac gacgabcccg ggcatcggcc 
tcgcgaaggc cgCgcgcggc-.gaggabgccc 
cgcacccaga cccgctgcac ccagacacgc 
ccggatgacc ttcctccicc tggggctgct 
cgaegcgcgc taccggatgt tcbb'cbggcg 
ccccggcgbc gcgatgcccc ccgtctggga 
ccgcgagccg aatgccfcact cgacgggget 
gccggcgccc ctcgccttcc tctgccagcc 
ccbcctcgcg caccgatccg cgcagcccgc 
aggggccogc oga^agcta cgccgcgctc 
ctcgccgcga Lcgcchggcg acgtgccccg 
gcgggcggcc tcgbgcbccb caccgcggcg 
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6721 cecgactcgc tgacgaccgc cgcgggccfcg ttcgactacg ecgacgcgcc cctgctcggc 

6781 ccgcgcctcg ggctcgcccc gatcgaggae ttcgcctacc cgatcgccgc goCgotgctc 

6841 tgctccacgg tcfcggacgcc gctcgggcga gcggacgccc cggcggctcg tgaccggcec 

6901 gcecgcgcgc ccagaggagc egagcgatga gcgccgtcgg cgccgaggca tccggccagc 

6961 gcctgctccc cgcgctcttc accgcaccgc gcccgctgag ctggatcaac accgccttco 

7021 cgfctcgcggc cgcgtacctg ctgaccgtgc gcgaggtcga cgtcgcgctc gtcgtcggca 

7081 ccctgttctt ccccgcgccg tacaacctcg cgatgtaegg cstcaacgac gtcttcgact 

7141 Ccgagtccga cgcgcggaat ccgcgcaagg gcggcgccga gggggccccg cegoegcscg 

7201 cccggcatcg cgcggcgctg atagcogcgg tggccccgac ggtgccgttc gccgtctggc 

7261 tcgcgctgct eggcggcccg tggtcgtggg cctggctcgc gctcagectg tccgccgtgg 

7321 tggcgnaccc ggcgcegggc ctcaggttca aggagatccc ggggcctgac tccctcaccc 

7381 cgagcacgca cttcgtctcg cecgcctgct acgggetcgc cctcgcgggg gcgacggCga 

7441 cgccgcagcfc cgtgccgctg ctgctcgcgt tcttcgtgtg gggcgtcgcg agccacgccc 

7501 tcggcgcggt gcaggacgtc gtgcccgatc gcgaggccgg gaccgggtcg atcgcgaccg 

7561 cgctgggggc eegeegcacg acccggctcg cgatcggcce ctggotgcte gcgggcgtgc 

7621 tgatgctcgg cacgtcgtgg ccggggccgc tcgccgeggt actcgccgtg ccgtacctcg 

7681 tcgcggcgtg gccgcaccgc toggtgagcg acgccgagfcc ggcgcgcgcg aacggcggct 

7741 ggcgceggtt cctcgcgarc aactaoggcg tcggcttcgc ggcgacgatg ctgctgatct 

7801 ggtacgcgct gctcscggcc £22gcagtcg ctccgcggng agggcgcgag tccgcgkcjcg 

7BS1 cgtcaccgcc cgtcgagggg cgtcactccc cgtcgagggg cggcgatccg agcaggagcc 

7921 eggtcgageg ggcgatgtgc egccgcatcg ccccgacgec cfccggcccgc acetcggcga 

7981 gcagctcgcg gtgctcgacg acgagttcgg cgaggctgta gtgccggcgg atgtgcagcs 

8041 ggaagagccg cagctcggcg ccgagcgcgg cgtgcgcctc gacgatcctg gggccgacgc 

8101 tggcggcgac gatcgcccgg tgcacctcga ggtggagccg ctcggcctcg agccaggccg 

SlSl gggtcgcccc gagotogccc ggaggacgca gcgectcctc ggtgacggcg agccgggcga 

8221 gctcgtcgag ggcgagcatc gccggggcga gcgccgactc gggccagtgc gagccgtagc 

8281 ggtcgcccgc gatgcgtacc gcttcgacet cgagcgcctc gcgcagttgc tgcagcgcga 

8341 gcacccgcgc gbggtcgaac tcggcgaccc gcactccgcg gcagggcgcc gactcggcga 

8401 gccgctcaga gaagagccgc tggaacgcgg ogcgcacggt gtgccgggac accccg^agc 

8451 gctcggccgc ctgctcctcg cgeagcggcg cccccgaggc cagcgcgccg otcaggatct 

8521 cgfceacggag cgcatcggcc afcccgctcga cggcggtggg cgccggcatg acgcggcggc 

8581 tcagtcgtcg 'cfcgacggcag cgcgcacgac gagggcgacg acgcaggcga ccacgacgac 

8641 cgccccgata c 
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ATCC 383 Micrococcus luteus CSO-carotenoid producing operon 

Proposed translations start eodons (bolded) 

cnE (GGPP synthase) 688 

crtB (phytoene synthase) 1788 

cnl (phytoene desaturase) 2780 

IctA (having homology with IctA of A ■ tnediolanus) 441 1 

IctB (having homology with IctB of A. medinlanus) 4755 

IctC (having homology with IctC of A, mediohnus) 524 3 

Proposed translational stop codons (underlined) 

crtE (GGPP synthase) 1789 

crtB (phytoene synthase) 2781 

crfl (phytoene desaturase) 4409 

IctA (having homology with IctA of A. mediolanus) 4756 

ldB (having homology with IctB of A. madiolanus) 5247 

IctC (having homology with IctC of A. mediohnus) 6116 

1 ctgcccccgc tgctcgtgca cgccatccgg ttcggeggcg getaeggggg tgcggtggtg 

61 cgggccctgc gccagctcgg gtgaccccgc ccgtggttgg acaggacccg ccgctgtcca 

121 geatgaeggt tattagaatt tctagtagtt aegaggeggg agtcaeeggg tgacggagaa 

181 cggagcgtgg agtgcgagcg tgagcccgca gtcgcgcgcg ctgcgtcggc tggtgcggct 

241 gaacgagggg ategggtace agatccgccg cctcatgggc ctgaaggaaa ccgaetaetc 

301 cgccatggcc ctgctcttgc ggagtccgat ggggccaacc gacctggeee acgctctgca 

361 catcaccacc -gcttccgcca cggccgtggt ggaccggctc gcacgggccg gtcacgtggt 

421 gcgtgaaccg caeggagagg accgccgecg catgacegtg cgggccgtgg ccggatcccg 

481 tgagcaggtg egggagcacg tggtgcccat gatggacatg gtcgaggagg agatcgegeg 

541 getggacgag tccggccgcg gggcegtcet gcagttCctc accggcaccg ccgaccgcat 

601 ggaggactac ctggcgggtc tgcgcgaacg cccggccggc actggcggcg ccacccaggg 

661 catgcccggc oceggggegg agegceeatg accteggaga cagacaccgc ggeggatccc 

721 accgcggtct gggatgtgtt ccgcgcggcc gttgaccggg agctggacga gttcttcgac 

781 tcccegcgea acagggfctcc ctacagcccg ggctteeegg tgatgtggga tcgcacccgg 

841 eagcaggtgg tgggcggcaa getgatcegg ccccgtctga cgcagatcgc gtggcgctcg 

301 ttcgccggtg agtcgagcac > tgaeteeggc egagaggecg agtgcgtgcg cctggcggcg 

961 tegttcgaga Cgetgcacgc ggegctgate gtgcacgacg aegtegtgga ccgggactgg 

1021 cgccgtcgtg ggcggcccac ggtgggcgag ctcttccgcc gcgacgcggt geaggegggg 

1031 gccdccgagg gegaggcega geacgegggg gagtccgegg cgatcctcgc gggagacctg 

1141 ctfcctggcgg gtgcgctgcg gctggcgaoc acgtgcaccg aggacceggg gcggggacgt 

1201 gccgtggcag acgtggtctt egaggeggtg accgcgtccg cggccggtga gctggacgac 

1261 ctcctgctct ctctgcaccg ctacggcgcg gagcacccgg gcgtgcagga catcctggac 

1321 atggagcggc tgaagacege cacgtactcg ttcgaggcac ccctgcgcgc cggcgccctg 

1381 etcgegggag cgcccgagga gcaggcccag cgcctggcgc gggccggcgc ccagctcggg 

1441 gtggcctacc aggtegtega cgacgtcctg ggaaccttcg gcgaccccga gctcaccggc 

1501 aagtcggtgg acgccgatct gaacteggge aaggccaccg ,tgctcaccgc ecaeggaatg 

1561 cagacccccg cggtgcggga cgtcctcgcg gagctcgegg ccgggcgtac cacggtcgcc 

1621 tccgcgcggg ctgccctgac ggcgtcggga gegcaggagg cagccgtggc agtggccaeg 

1631 gacctcgtgg accgggcccg ggccaccctg gaeggtctea cgctgcccgc tgcccagcgc 

1741 geggagcteg acgcgctgtg ccaccacgtc ctgaacagag actegtagtg aggaccccca 

1801 ccatgcccca ggacgcaccg gccgacgcgc cgctgagcct ctacaccgcc aecgcgctgg 

18S1 cggcctcggg cgcggtgatc gggegctact ccacgtcctt ptcgcitggcg tgccggaccc 

1921 tgeeggegge ggtgcgccgg gaeatcgegg ggatctaege cctcgtgcgc gtggcggacg 

1981 aggfcggtgga cgggacggcc ggggeggegg gtctcggcgc ggaccgggtg cgcgcggcgc 

2041 tegacgegta egaggecgag gtggcctccg cgctcgecac gggcttctcg accgaoccgg / 

2101 tggtccaegg cttcgcgggc gtcgcccgcc gtcaeggett eggcaeggag ctcacggagc . 

2161 cgttcttcgc gtccatgcgc gcggacctgg acgfcggccga gcacgacggc gcctcgcttg ' 

2221 agtcctacat ctacggcccg geggaggteg fcggggctgat gtgcctggag gtettcatgg 

22S1 acatgcccgg eacccgcgcc cagaccccgg ageageggga gatgotgege ' gccacggccc 
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2341 gccggctggg tgccgcgttc cagaaggtca 

24 01 aceagctcgg acgcacctac ttccccggcg 

24 SI agcggctgct gctcgcggac ctcggcgcgg 

2521 cgctggaccg ccgtgccggg cgcgcggtgc 

2591 cacggcggat cgaggaggtg cccgcggcgg 

2S41 ccggggtgaa gccgcggatc gccgcgagag 

2701 acgggcgggg ccgagcccta gagtcggggc 

2751 cccggacggg ggccacccga. tgacgcgcae 

2 821 ggccacggcg ggcctgctcg cccgggacgg 
2881 cacggtgggc ggccgctccg ggcggtggfcc 
2941 cagctggtac ctcatgcccg aggtgatcga 
3001 cgccgagcag ctggacctgc gccggctgga 
30 SI cctggcggaa ccgcccacgg acgtggtcac 
3121 cgacccggga tcctcccgcg cactgcgctc 
3181 gctcgccaag aagcacttcc tctacacgga 
3241 ggaggtgctc cgcaacctcc cgcggctggc 
3301 cgttgcgcgc cgttttccgg agccgcggca 
3361 cctgggggcg tccccctcgt ccgccccggc 
3421 caccgacgga gtgcagtacc cggtgggcgg 
3481 gctcgtgcgc gaggccggcg tggagatcgt 
3541 ggctcccgag ccgcggtcgc cgcgttcccg 
3501 cgccggcacg gtcacgggcg tcaccttccg 
3661 gccgggeggc gtcgfccgccg gtgcggaggt 
3721 cgcggacctg caccacotcc agacccgcct 
3781 ccgcfcggaag cgccgcgacc ccgggccctc 

3 841 gaagctgccg cagctggccc accacaacct 
3901 cgggcgcatc gagtccggtg cggacctggc 
3961 gtcggcgacg .gatcccggca ccgcgcccga 
4021 ctcgcccgcg gcacccgagt ggggtcacgg 
4081 cggctccgcg caggtggagc gggtcgctga 
4141 gcagatcccg gaactggcct cgcggatcgt 
4201 cgcggtgggg gtcaacgcgt ggcgcggctc 
4261 gcccgcgatg ttccgtccca gcgtcaccga 
4321 gtcctcggtg cgcccgggga ccggcgtgcc 
43B1 ggacgccgtg cgggagagcg gggcgcgctg 
4441 ggctgtttcg cgctcatcga ccggcgctgg 
4501 cgggcctggc tcgtgctggt caccggggtg 
4561 atcgccaacg gactgttctg' gcacggcgag 
4621 cccgagctgc ccctggaaga ggtcttcttc 
4681 nacgtgctcg gegcgcccgt gctgtggcgg 
4741 gcggggaggc gggcatgacg tactggggcg 
4801 tcgtgctgct gacgacggcg cCqgtgeggc 
4861 cggcctccac agtgctgete gtggtgctca 
4921 ccgggatcat gacgtacacg gaccgcaaca 
4981 tggaggactt cgcctacccc gtggccggtg 
5041 tgggaggcac gcccggggcg gcggccggtg 
5101 ccgcggtcgc agccgcaacc gcagccggcg 
5161 cggacaccga tggtacgagc accgggcgcg 
5221 ccgccgatgg aagggacgaa ccgtgccgag 
5281 cfcgggtgaac accgcctacc cgttcgcggc 
5341 gtggctcgtg gcgctggggg ccgtgttctt 
5401 catcaacgae gtcttcgact acgagtcgga 
5461 gggcgcggtg gtggatcgcg ccgcceagcg 
5521 ggtgccgtcc gtcgcggtgc tggcggggta 
5581 gctggtgctg gcggtgagcc tgttcgcggt 
5641 taaggagcgc ccgttcgtgg atgcgatgac 
5701 ctacggactg gtgctcgcac gggcggactt 
5761 cttcttcctg tggggcatgg cctcgoagat 
5821 ccgtgagggt gggctggcct cagtggccac 
Sa81 cgcggcgggc ctctacgccc tcgcaggtgc 



acttcctgcg ggatctcggc gcggaccacg 
cggacccctc ccacctggac gagacccgca 
acctggacgc ggccgtgccc gggatcctcg 
tgatcgcgca cggactgttc ggcgagctcg 
agcccacacg acggcgcatc agcgtgcccg 
cgctgtccgt caccgcgcgc acgggctcac 
ccccggtgcc ggcggccgtg cccgaaacct 
ggtggtgafcc ggcggcggct tcgcgggcct 
gcacagcgtc accctgctcg agcagcagga 
cgcggagggc ttctcgttcg acaccggacc 
ccgctggttc accctgatgg gcacgagcgc 
ccagggctae egcgtctfcet tcgaggacca 
cggtegtgcc gaggagctgt tcgagagcct 
ctacctggac tcgggcgcgc aggtctacga 
cttcgcccac ccgctggacc ttgtgcgccc 
aacgctgctg ggcacgtcca tgaagaacta 
gcgccagatc ctgggctacc ccgccgtctt 
catgtaccac ctcatgagcc acctggacct 
gttcgccgcg ctggtggacg ccatggaacg 
cacgggagcc accgtgaccg gcatcgaggt 
gttggccgca gcccgggcac gacgtcgcac 
cacggcgccg ggggcggacc cggggacgga 
caccgtgccc gcggacgtcg tcgtcggcgc 
gcttcccggc ccgttccgcg caccggagtc 
cggggtgctc gtgtgcctgg gcgtgcgcgg 
gctgttcacc gcggactggg atgagaaatt 
cgaggagacc tcgatctacg tgtccacgac 
gggggacgag aacctgttca tcctggtgcc 
cggaacoacc gccccgggcg fcegacgagcc 
cgccgccatc gcgcagctcg cgcgctgggc 
ggtgcgcagg acctacgggc ccgaggactt 
cctgatgggc cccggacaca tfcctgacgca 
ccgtgggatc cgggggctgt tctacgccgg 
catgtgcctg atctcctccg aggtggtgcg 
atgtacctgc tcctgctgct cgtcctcctg 
aacctgtact tctggtccgg acacecgctg 
gtgttcttcc tcgcgtggga cetggtgggg 
aactccctga ecctggggat cttcgtggct 
ctcgcgttcc tctgctacca gaccafcggtc 
tggctgaggg cccgcaccgg cgcggcacac 
tgaacgcggt efctcctgggg atggcggcgg 
gcccacccgc ccggttccgg ggagcgctcg 
ccgccgtctt cgacaacgtc atgatcgcct 
tctcgggcgt gcggatcggg ctcgccccgc 
tgctgctgct gccgacgatg tggctgctgc 
acgggcgggc gacggcggcg tcgtcgtccc 
cgggcgacga gaacgcgagc ggtgaggaeg 
cacatgccgg gggcaggccc agtgggaacc 
gaegctgttc tgggcctcgc gcccgctgag 
ggccgtgctg cfcgacgggcg gtttgccctg 
cctggtgccc tacaacctgg cgatgtacgg 
cctgcgcaac ccccgcaagg gcggcgtgga 
cggcgtgctg cgggcctcgt gcctgctgcc 
cgggatcgtg accgggaacc tgctgticcgt 
ggtcgcgtac tcgtgggcgg ggetgegctt 
ctccgccacc cacttcgtct cgcccgcegt . 
cacggtgggg ctgtgggcgg tgctcgtggg ,' 
gttcggggcg gtgcaggacg tggtaccgga ' 
cgtgctcggt gcgcgcccca cogtgtggct 
cctgatgctg ctcgcccagt ggccgggtca 
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5941 gctcgcggcg ctgcccgcgg Cgccgfcaeet 

6001 ggacgaggac tccggecggg ccaacgccgg 

6061 gaccggtttc ctggtcacga tgetgcegat 

6121 ggatgcccaa cgcccgggac cggtgcggcc 

6181 cccgcggtct gcgtgcccgg ggctggcatc 

6241 ctcacaccgc ccgggtcaac gacacgeagc 

6201 cggtcgtgcc gccgcacggc cacggctacg 

63 61 tgctcgttga ccagggatgg cgcgtgatcg 

6421 aagtcacgcc gggcatcgtc tacaccgagg 

64S1 accgcctggg cctggactca gtggtgctgg 

6541 tgcagattgc tgcgacccac cctgagcggg 

6601 cgcacgccga gaacgcggcg gggcggcgCc 

6661 cgggcgggat gccggcctac gcggacaggg 

6721 tggaacggct gcctgtggtg gccgacacgg 

6781 agggggcgga cgcggccatg cgcgggcgtg 

6841 gggcgtggcg caagcccgcg ctcgtggtcg 

6901 cggcccggcg gatggccgag ctgctgccgc 



ggtcaacgcg ctgcgcttcc ggggcgtcac 
gtggaggaog ttcctgtggt tgaactacgc 
ctggtgggcc cgggttcacg tget gcqa ac 
cggcctggfcg aggcccggcc tggtgcatgg 
atgggcgcat gagccgatcg acgttcgcca 
tcgcctacac ggacgagggg cagggtctgg 
accgctcaat gtgggacgcg cagatcccgg 
ccccggacct gcgcggcttc ggagattcgg 
agttcgcggc ggacaccatc gcgccgctgg 
tggggttttc gatggcgggg caggtggccc 
tggccgcgct ggbcgtcaac gacacggtgc 
gtcgtcacgt gggcgcggac gggaCcctga 
tgctcgcctc catgatccgc gaggacaacg 
tgcgcgagat gatcgccgcg tgtccggcgg 
ccgagcgcaa cgacttcacc gagacgctgc 
tgggggacgg ggacgcgttc gacggcggcg 
acggcgagct c 



FIG. 12 
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SEQUENCE LISTING 

The nucleic and amino acid sequences listed in the accompanying sequence 
listing are shown using standard letter abbreviations for nucleotide bases, and 
three-letter code for amino acids. Only one strand of each nucleic acid sequence 
is shown, but the complementary strand is understood to be included by any 
reference to the displayed strand. 

SEQ NO: 01 is the nucleic acid sequence for the IctA gene isolated from A. 
mediolanus. 

1 atgaccttcc tccacctggg gctgctgctc gcctcgatcg cgtgcatcgc 
51 gctcgtcgac gcgcgctacc ggctgttctt ctggcgggcg ccgctgcggg 
101 cgacggtcgt ggtcgccctc ggcgtcgcga ' tgctcctcgt ctgggacctc 
151 tggggcatct cgctcggcat cttcttccgc gagccgaatg cctactcgac 
201 ggggctgctc attgcgccgc acctgccgat cgaggagccg gtgttcctcg 
251 ccttcctctg ccagctcgcg atggtcggct acacgggact gctgcgcctc 
301 ctcgcgcacc gatccgcgca gcccgccacc ggccccgctg ccgactccac 
351 cgccgaaggg gcccgccgat ga 

SEQ NO: 02 is the nucleic acid sequence for the IctB gene isolated from A. 
mediolanus. 

1 atgagctacg ccgtgctctg cctcccgttc ctcgccgtct cggcggtgct 
51 cgccgcgatc gcctggcgac gtgctccggc cggtcacgcg gccgcgctcg 
101 cgctcacggc gggcggcctc gtgctcctca ccgcggtgtt cgactcgctg 
151 atgatcgccg cgggcctgtt cgactacgcc gacgcgcccc tgctcggccc 
201 gcgcctcggg ctcgccccga tcgaggactt cgcctacccg atcgccgcgc 
251 tgctgctctg ctccacggtc tggacgctgc tcgggcgagc ggatgcctcg 
301 gcggctcgtg accggcccgc ccgcgcgccc agaggagccg agcgatga 

SEQ NO: 03 is the nucleic acid sequence for the IctC gene isolated from A. 
mediolanus. 

1 atgagcgccg tcggcgccga ggcatccggc cagcgcctgc tccccgcgct 

51 cttcaccgca tcgcgcccgc tgagctggat caacaccgcc ttcccgttcg 

101 cggccgcgta cctgctgacc gtgcgcgagg tcgacgtcgc gctcgtcgtc 

151 ggcaccctgt tcttcctcgt gccgtacaac ctcgcgatgt acggcatcaa 

201 cgacgtcttc gacttcgagt ccgacgcgcg gaatccgcgc aagggcggcg 

251 tcgagggggc cctgctgccg cccgcccggc atcgcgcggt gctgatcgcc 

301 gcggtggccc tgacggtgcc gttcgtcgtc tggctcgtgc tgctcggcgg 

351 cccgtggtcg tgggcctggc tcgcgctcag cctgttcgcc gtggtggcgt 

401 actcggcgcc gggcctcagg ttcaaggaga tcccggggcc tgactccctc 

451 acctcgagca cgcacttcgt ctcgcccgcc tgctacgggc tcgccctcgc 

501 gggggcgacg gtgacgccgc agctcgtgct gctgctgctc gcgttcttcg 

551 tgtggggcgt cgcgagccac gccttcggcg cggtgcagga cgtcgtgccc 

601 gatcgcgagg ccgggatcgg gtcgatcgcg accgcgctgg gggcccgccg 

651 cacgacccgg ctcgcgatcg gcctctggct gctcgcgggc gtgctgatgc 

701 tcggcacgtc gtggccgggg ccgctcgccg cggtactcgc cgtgccgtac 

751 ctcgtcgcgg cgtggccgta ccgctcggtg agcgacgccg agtcggcgcg 

801 cgcgaacggc ggctggcgct ggttcctcgc gatcaactac ggcgtcggct 

851 tcgcggcgac gatgctgctg atctggtacg cgctgctcac ggcctga 
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SEQ NO: 04 is the amino acid sequence encoded by SEQ NO: 01. 

1 mtflhlglll asiacialvd aryrlffwra plratvvval gvamllvwdl 
51 wgislgiffr epnaystgll iaphlpieep vflaflcqla mvgytgllrl 
101 lahrsaqpat gpaadstaeg arr 

SEQ NO: 05 is the amino acid sequence encoded by SEQ NO: 02. 

1 msyavlclpf lavsavlaai awrrapagha aalaltaggl vlltavfdsl 
51 miaaglfdya dapllgprlg lapiedfayp iaalllcstv wtllgradas 
101 aardrparap rgaer 

SEQ NO: 06 is the amino acid sequence encoded by SEQ NO: 03. 

1 msavgaeasg qrllpalfta srplswinta fpfaaayllt vrevdvalvv 

51 gtlfflvpyn lamygindvf dfesdarnpr kggvegallp parhravlia 

101 avaltvpfvv wlvllggpws wawlalslfa vvaysapglr fkeipgpdsl 

151 tssthfvspa cyglalagat vtpqlvllll affvwgvash afgavqdwp 

201 dreagigsia talgarrttr laiglwllag vlmlgtswpg plaavlavpy 

251 lvaawpyrsv sdaesarang gwrwflainy gvgfaatmll iwyallta 

SEQ NO: 07 is the nucleic acid sequence for the IctA gene isolated from M. 
luteus. 

1 atgtacctgc tcctgctgct cgtcctcctg ggctgtttcg cgctcatcga 

51 ccggcgctgg aacctgtact tctggtccgg acacccgctg cgggcctggc 

101 tcgtgctggt caccggggtg gtgttcttcc tcgcgtggga cctggtgggg 

151 atcgccaacg gactgttctg gcacggcgag aactccctga ccctggggat 

201 cttcgtggct cccgagctgc ccctggaaga ggtcttcttc ctcgcgttcc 

251 tctgctacca gaccatggtc tacgtgctcg gcgcgcccgt gctgtggcgg 

301 tggctgaggg cccgcaccgg cgcggcacac gcggggaggc gggcatga 

SEQ NO: 08 is the nucleic acid sequence for the IctB gene isolated from M. 
luteus. 

1 atgacgtact ggggcgtgaa cgcggtcttc ctggggatgg cggcggtcgt 
51 gctgctgacg acggcgctcg tgcggcgccc acccgcccgg ttctggggag 
101 cgctcgcggc ctccacagtg ctgctcgtgg tgctcaccgc cgtcttcgac 
151 aacgtcatga tcgcctccgg gatcatgacg tacacggacc gcaacatctc 
201 gggcgtgcgg atcgggctcg ccccgctgga ggacttcgcc taccccgtgg 
251 ccggtgtgct gctgctgccg acgatgtggc tgctgctggg aggcacgccc 
301 ggggcggcgg ccggtgacgg gcgggcgacg gcggcgtcgt cgtcctccgc 
351 ggtcgcagcc gcaaccgcag ccggcgcggg cgacgagaac gcgagcggtg 
401 aggacgcgga caccgatggt acgagcaccg ggcgcgcaca tgccgggggc 
451 aggcccagtg ggaaccccgc cgatggaagg gacgaaccgt gctga 

SEQ NO: 09 is the nucleic acid sequence for the IctC gene isolated from M. 
luteus. 

1 gtgctgagga cgctgttctg ggcctcgcgc ccgctgagct gggtgaacac 
51 cgcctacccg ttcgcggcgg ccgtgctgct gacgggcggt ttgccctggt 
101 ggctcgtggc gctgggggcc gtgttcttcc tggtgcccta caacctggcg 
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151 atgtacggca tcaacgacgt cttcgactac gagtcggacc tgcgcaaccc 

201 ccgcaagggc ggcgtggagg gcgcggtggt ggatcgcgcc gcccagcgcg 

251 gcgtgctgcg ggcctcgtgc ctgctgccgg tgccgttcgt cgcggtgctg 

301 gcggggtacg ggatcgtgac cgggaacctg ctgtccgtgc tggtgctggc 

351 ggtgagcctg ttcgcggtgg tcgcgtactc gtgggcgggg ctgcgcttta 

401 aggagcgccc gttcgtggat gcgatgacct ccgccaccca cttcgtctcg 

451 cccgccgtct acggactggt gctcgcacgg gcggacttca cggtggggct 

501 gtgggcggtg ctcgtgggct tcttcctgtg gggcatggcc tcgcagatgt 

551 tcggggcggt gcaggacgtg gtaccggacc gtgagggtgg gctggcctcc 

601 gtggccaccg tgctcggtgc gcgccccacc gtgtggctcg cggcgggcct 

651 ctacgccctc gcaggtgccc tgatgctgct cgcccagtgg ccgggtcagc 

701 tcgcggcgct gctcgcggtg ccgtacctgg tcaacgcgct gcgcttccgg 

751 ggcgtcacgg acgaggactc cggccgggcc aacgccgggt ggaggacgtt 

801 cctgtggttg aactacgcga ccggtttcct ggtcacgatg ctgctgatct 

851 ggtgggcccg ggttcacgtg ctgtga 

SEQ NO: 10 is the amino acid sequence encoded by SEQ NO: 07. 

1 mylllllvll gcfalidxrw nlyfwsghpl rawlvlvtgv vfflawdlvg 

51 ianglfwhge nsltlgifva pelpleevff laf lcyqtmv yvlgapvlwr 

101 wlrartgaah agrra 

SEQ NO: 11 is the amino acid sequence encoded by SEQ NO: 08. 

1 mtywgvnavf lgmaavvllt talvxrppar fwgalaastv llvvltavfd 

51 nvmiasgimt ytdrnisgvr iglapledfa ypvagvlllp tmwlllggtp 

101 gaaagdgrat aassssavaa ataagagden asgedadtdg tstgrahagg 

151 rpsgnpadgr depc 

SEQ NO: 12 is the amino acid sequence encoded by SEQ NO: 09. 

1 vlrtlfwasr plswvntayp faaavlltgg lpwwlvalga vf flvpynla 

51 mygindvfdy esdlrnprkg gvegavvdra aqrgvlrasc llpvpfvavl 

101 agygivtgnl lsvlvlavsl favvayswag lrfkerpfvd amtsathfvs 

151 pavyglvlar adftvglwav lvgfflwgma sqmfgavqdv vpdregglas 

201 vatvlgarpt vwlaaglyal agalmllaqw pgqlaallav pylvnalrfr 

251 gvtdedsgra nagwrtflwl nyatgflvtm lliwwarvhv 1 

SEQ NO: 13 is the nucleic acid sequence for the idi gene isolated from A. 
mediolanus. 

1 atgaccgacc tcagcatcac gccgctgccg gcccaggccg caccggtgca 

51 gcccgcatcc agcgccgaat tggtcgtgct gctcgacgag gccggcaacc 

101 agatcggcac cgccccgaag tcgagcgtgc acggcgccga caccgccctc 

151 catctcgcgt tctcctgcca cgtcttcgac gacgacggcc gcctcctggt 

201 gacccgtcgc gcgctcggca aggtcgcctg gcccggcgtg tggaccaact 

251 ccttctgcgg gcaccccgcc ccggccgagc cgctgccgca cgcggtgcgc 

301 cgccgggccg agttcgagct cggcctcgag ctccgcgacg tcgagccggt 

351 gctgccgttc ttccgctacc gggcgacgga tgcctcgggc atcgtcgagc 

401 acgagatctg cccggtctac acggcgcgca caagctcggt gccggcgccg 

451 catcccgacg aggtcctcga cctcgcctgg gtcgaaccgg gcgagctcgc 

501 caccgcggtc cgcgccgcgc cctgggcgtt cagtccctgg ctcgtgctgc 

551 aggcgcagct gctgcccttc ctcggcggcc acgccgacgc gcgcgtccgc 

601 acggaagcgc tcgtctcgtg a 
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SEQ NO: 14 is the nucleic acid sequence for the crtE gene isolated from A 
mediolanus. 



1 gtgagcctcg tcgcgaccgt 

51 gcgctacctc ggcggcttct 

101 acgccgccga ctaccggcgg 

151 ggcggcaagc ggatccgccc 

201 cgccgcgcag ggtgcgccgg 

251 cggccgccgc cgcggaggcc 

301 cacaccgcgt tcctcgtgca 

351 ccggggcgag cccaacgtcg 

4 01 gcgggctcga gcgggagcgg 

451 ctcgcgggcg acctgctgat 

501 gacgtgccgg tcgagcgccg 

551 gcgtcttcgc cgccgccgcg 

601 ggggtgcggc ccggggaggc 

651 ggcctgctac tcgttcagcg 

701 gcgccccgcg cgcgacggtc 

751 ggcgtcgcct tccagctgca 

801 gcgggtgacc ggcaagacgg 

851 cgctgctcat cgcctacgcg 

901 ggcgccttcg gccggcccga 

951 cgcggcgatc gaggcgagcg 

1001 ccgaggaggc ggccgcggcg 

1051 gccgcgctcg aagccgagtt 

1101 gtcgaggtga 



ggtcgccccg agccggcagg cggaggtgga 
tcgacgacgc catcgtgcgg gccgacgcgc 
ctctgggcgg cggcgcggga cgccgcgagc 
caggctcgtg ctgggcgcct acgacgcgct 
cgagcggccg cgaacgggcc gacgccgagc 
gtggcgctcg cggcggcctt cgagctgctg 
cgacgacgtc atcgaccgcg acctcgtgcg 
ccggccgctt cgcgctcgac gccgcgctgc 
gcggacgcct acggccaggc ctcggcgatc 
cgcggcggcg cactccgtgg cggccgcctc 
gcgagccatc ctcgccgtcc ttgacgaagt 
ggcgagcacg ccgacgtccg gcacgccgcc 
ggacatcctc gcgatgatcg aggacaagac 
cgccgctccg ggcgggcgcg ctgctcgccg 
gaacggctcg gcgagatcgg ccgtcgactc 
ggacgacgtg ctcggcgtc.t acggcgacga 
cgctcgggga cctccgcgag ggcaaggaga 
cgggggcacg cggcctgggt cgcggcatcc 
cctcgacgag gcgggcgccc gccccctccg 
gcgcccgcgc ccgcgtcgag gcgcgcatcg 
cgcacggcga tcgccgcggc gggcctgccc 
gctcggcctc gccgccgaag ccaccaggag 



SEQ NO: 15 is the nucleic acid sequence for the crtB gene isolated from A 
mediolanus. 



1 gtgagcacgc gcaccaccca 

51 cggcctcgcc ctctacgacc 

101 tccgggcgta ctcgacctcc 

151 gccgtccgcg agcacctcgc 

201 cgagctcgtc gacggcccgg 

251 gccgcgagct gctcgacgcc 

301 agcggctaca gcgccaacct 

351 gcgcagcggc ttcggccagg 

401 gacgcgacct cgagcccatc 

451 tacgtctacg gctcggccga 

501 cgcgatcggg ctcgcccccg 

551 gcgcgcgggc gctgggctcg 

601 ctcggggagg atgcctcgct 

651 tccggtgagc ttctcggagg 

701 acgcggagct cgacgaggcg 

751 tgccgcgtcg cggtcgccgc 

801 ccggctccgc cgcacgcccg 

851 tgcccgcgcc gcgcaagctc 

901 ggccggccgt ga 



gcgcacgacc gcgccgcccg caccgtccac 
gcaccgccgc cgagggctcg gcccgggtca 
ttcggcctcg cgagccggct ctgctccccc 
cgaggtctac gcgctcgtgc gcatcgccga 
ccgaggaggc cgggctgccg tgcgagcgcc 
ctcgaggccg acacggaggc cgccttcgag 
cgtggtgcac gccttcgcgc gcgcggcgcg 
agctcacccg gcccttcttc gcctcgatgc 
gccttcaccg aggagcgcga gctcgacgaa 
ggtcgtcggc ctgatgtgcc tgcgcggctt 
acgccgagcg cgacgcccgc tgggagcgcg 
gcgttccagc gggtcaactt cctgcgggac 
ccgcggacgc cgctacttcc cgggcgtcga 
cccagcaact gcgcctcctc gacggcatcg 
gccgccgtga tcccggagct gccccgcggc 
ggcgcacggc ctgttcggcg agctctccgc 
cggccgagct cgtcacccgg cgggtccggg 
gccatcgtca cccgcgtggt cgcccgcgga 
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SEQ NO: 16 is the nucleic acid sequence for the crtl gene isolated from A 
mediolanus. 



1 gtgagccgcg cggtcgtcat 

51 ggcgctgctc gcccgcgacg 

101 acgagctcgg cggccgtgcc 

151 gacaccggtc cgagctggta 

201 ccgcttgatg ggcaccacgg 

251 accccggcta ccgggtgtac 

301 cgggccgagc gcgaggcatc 

351 cgcgggcgcc gcgctcgccc 

401 ggctcgcgat gacgcacttc 

451 ctgctcgccg cgccggtccg 

501 gctcgaaccg ctcgaccgca 

551 tgcggcagat cctgggctac 

601 cgggcgccga gcatgtacca 

651 ggtgttctac ccgatgggcg 

701 ggctggcccg gcgggccggg 

751 ggcatcgaga cggccggcgg 

801 cggcccgacc ggtggcaccg 

851 tcgtctccgc cgccgatctg 

901 cgcgcgcgg'a cgcggagcga 

951 cggcgcggtg ctcgtcatgc- 

1001 cccaccacac gctctgcttc 

1051 gtgttcggct cgcgaccggc 

1101 ccgcccgagt gcgacggatc 

1151 tgttcctgct cgtgccggtg 

1201 gtcgacggcc gcggcgaccg 

1251 cgcgaccctc gccgagtggg 

1301 tcgtgcgccg cacgatcggg 

1351 tggcgcggct cggcgctcgg 

1401 gttccggggg cgcacggcct 

1451 gggcgacgac gatcccgggc 

1501 gagctcgtcg cgaaggccgt 

1551 ggagccgagc gaggagccgc 

1601 acccagaccg gctcgaccgg 



cggcggcggc atcgccgggc tcgccacggc 
ggcacgaggt gcggctcttc gaggcgcgcg 
gggcgctggc gggcgaacgg cttcctgttc 
cctcatgcca gaggtgttcg agcacttcta 
cggccgagga gctcgagctc gtgcgcctcg 
ttcgagggct acgacgagcc ggtcgacgtg 
catcgccctc ttcgagtcga tcgagccggg 
ggcacctcga ctccgccaac gagacgtacc 
ctctacaccg acttcgccca cccgggggcg 
gcggcggctc ggccggctcg cgaagctgct 
tggtggggcg ctccttcgac gacgtgcggc 
ccggcggtct tcctcggcac ctcgcccgag 
cctgatgagc cgcttcgacc tcgccgacgg 
gcttcggcga gatcatcgcg agcgtggccc 
gccgagctcg tcaccggcgc gcgggtgctc 
gcgcgccacg ggcgtgcgcg tgcagcacca 
gcaccgagga gttcctggag gccgagctcg 
caccacacgg atgccgagct gctcccgccc 
ggcatcctgg tcgcgccgcg accccggacc 
tcggcgtgca cgggcggctg ccggagctcg 
acggccgact ggcgcacgaa cttccagcgg 
gatccccgac ccggcgtcgt tctacgtctg 
cgggcgtggc gccccccggc tgcgagaacc 
cccgccgacc ccacaatcgg cgccggcggt 
ggcggtcgag gagacggccg accgggcgat 
ccggcatccc cgacctcgcc gagcggatcc 
cccgcggact tcgaggactg gttccagtcc 
cccggggcac accctgcggc agagcgccat 
cggcgaacgt cgaggggctg tacttcgcgg 
atcggcctgc cgatgtgcct gatcagcgcc 
gcgcggcgag gatgccccgg gcccgctccc 
acccagaccc gctgcaccca gacccgctgc 
gagcgcaccg gatga 



SEQ NO: 17 is the amino acid sequence encoded by SEQ NO: 13. 

1 mtdlsitplp aqaapvqpas saelvvllde agnqigtapk ssvhgadtal 

51 hlafschvfd ddgrllvtrr algkvawpgv wtnsfcghpa paeplphavr 

101 rraefslgle lrdvepvlpf fryratdasg iveheicpvy tartssvpap 

151 hpdevldlaw vepgelatav raapwafspw lvlqaqllpf lgghadarvr 

201 tealvs 



SEQ NO: 18 is the amino acid sequence encoded by SEQ NO: 14. 

1 vslvatvvap srqaeveryl ggffddaivr adahaadyrr lwaaardaas 

51 ggkrirprlv lgaydalaaq gapasgrera daepaaaaea valaaafell 

101 htaflvhddv idrdlvrrge pnvagrfald aalrglerer adaygqasai 

151 lagdlliaaa hsvaaastcr ssagepssps ltkcvfaaaa gehadvrhaa 

201 gvrpgeadil amiedktacy sfsaplraga llagapratv erlgeigrrl 

251 gvafqlqddv Igvygdervt gktalgdlre gketlliaya rghaawvaas 

301 gafgrpdlde agarplraai easgararve ariaeeaaaa rtaiaaaglp 

351 aaleaellgl aaeatrrsr 
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SEQ NO: 19 is the amino acid sequence encoded by SEQ NO: 15. 

1 vstrttgrtt appapstgla lydrtaaegs arviraysts fglasrlcsp 

51 avrehlaevy alvxiadelv dgpaeeaglp cerrrellda leadteaafe 

101 sgysanlvvh afaraarrsg fgqeltrpff asmrrdlepi afteerelde 

151 yvygsaevvg lmclrgfaig lapdaerdar wergaralgs afqrvnflrd 

201 lgedaslrgr ryfpgvdpvs fseaqqlrll dgidaeldea aavipelprg 

251 crvavaaahg lfgelsarlr rtpaaelvtr rvrvpaprkl aivtrvvarg 

301 grp 



SEQ NO: 20 is the amino acid sequence encoded by SEQ NO: 16. 

1 vsravviggg iaglataall ardghevrlf eardelggra grwrangflf 

51 dtgpswylmp evfehfyrlm gttaaeelel vrldpgyrvy fegydepvdv 

101 raereasial fesiepgaga alarhldsan etyrlamthf lytdfahpga 

151 llaapvrrrl grlaklllep ldrmvgrsfd dvrlrqilgy pavflgtspe 

201 rapsmyhlms rfdladgvfy pmggfgeiia svarlarrag aelvtgarvl 

251 gietaggrat gvrvqhhgpt ggtgteefle aelvvsaadl hhtdaellpp 

301 rartrseasw srrdpgpgav lvmlgvhgrl pelahhtlcf tadwrtnfqr 

351 vfgsrpaipd pasfyvcrps atdpgvappg cenlfllvpv padptigagg 

401 vdgrgdrave etadraiatl aewagipdla erilvrrtig padfedwfqs 

451 wrgsalgpgh tlrqsaznfrg rtasanvegl yfagattipg iglpmclisa 

501 elvakavrge dapgplpeps eephpdplhp dplhpdrldr ertg 



SEQ NO: 21 is the nucleic acid sequence for the crtE gene isolated from M. 
luteus. 



1 


atgacctcgg 


agacagacac 


51 


gttccgcgcg 


gccgttgacc 


101 


gcaacagggt 


tccctacagc 


151 


cggcagcagg 


tggtgggcgg 


201 


cgcgtggcgc 


tcgttcgccg 


251 


ccgagtgcgt 


gcgcctggcg 


301 


atcgtgcacg 


acgacgtcgt 


351 


cacggtgggc 


gagctcttcc 


401 


agggcgaggc 


cgagcacgcg 


451 


ctgcttctgg 


cgggtgcgct 


501 


ggggcgggga 


cgtgccgtgg 


551 


ccgcggccgg 


tgagctggac 


601 


gcggagcacc 


cgggcgtgca 


651 


cgccacgtac 


tcgttcgagg 


701 


gagcgcccga 


ggagcaggcc 


751 


ggggtggcct 


accaggtcgt 


801 


cgagctcacc 


ggcaagtcgg 


851 


ccgtgctcac 


cgcccacgga 


901 


gcggagctcg 


cggccgggcg 


951 


gacggcgtcg 


ggagcgcagg 


1001 


tggaccgggc 


ccgggccacc 


1051 


cgcgcggagc 


tcgacgcgct 


1101 


g 





cgcggcggat cccaccgcgg tctgggatgt 
gggagctgga cgagttcttc gactccccgc 
ccgggcttcc cggtgatgtg ggatcgcatc 
caagctgatc cggccccgtc tgacgcagat 
gtgagtcgag cactgactcc ggccgagagg 
gcgtcgttcg agatgctgca cgcggcgctg 
ggaccgggac tggcgccgtc gtgggcggcc 
gccgcgacgc ggtgcaggcg ggggcccccg 
ggggagtccg cggcgatcct cgcgggagac 
gcggctggcg accacgtgca ccgaggaccc 
cagacgtggt cttcgaggcg gtgaccgcgt 
gacctcctgc tctctctgca ccgctacggc 
ggacatcctg gacatggagc ggctgaagac 
cacccctgcg cgccggcgcc ctgctcgcgg 
cagcgcctgg cgcgggccgg cgcccagctc 
cgacgacgtc ctgggaacct tcggcgaccc 
tggacgccga tctgaactcg ggcaaggcca 
atgcagaccc ccgcggtgcg ggacgtcctc 
taccacggtc gcctccgcgc gggctgccct 
aggcagccgt ggcagtggcc acggacctcg 
ctggacggtc tcccgctgcc cgctgcccag 
gtgccaccac gtcctgaaca gagactcgta 
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SEQ NO: 22 is the nucleic acid sequence for the crtB gene isolated from M. 
luteus. 

1 gtgaggaccc ccaccatgcc ccaggacgca ccggccgacg cgccgctgag 

51 cctctacacc gccaccgcgc tggcggcctc gggcgcggtg atcgggcgct 

101 actccacgtc cttctcgctg gcgtgccgga ccctgccggc ggcggtgcgc 

151 cgggacatcg cggggatcta cgccctcgtg cgcgtggcgg acgaggtggt 

201 ggacgggacg gccggggcgg cgggtctcgg cgcggaccgg gtgcgcgcgg 

251 cgctcgacgc gtacgaggcc gaggtggcct ccgcgctcgc cacgggcttc 

301 tcgaccgacc tggtggtcca cggcttcgcg ggcgtcgccc gccgtcacgg 

351 cttcggcacg gagctcacgg agccgttctt cgcgtccatg cgcgcggacc 

401 tggacgtggc cgagcacgac ggcgcctcgc ttgagtccta catctacggc 

451 tcggcggagg tcgtggggct gatgtgcctg gaggtcttca tggacatgcc 

501 cggcacccgc gcccagaccc cggagcagcg ggagatgctg cgcgccacgg 

551 cccgccggct gggtgccgcg ttccagaagg tcaacttcct gcgggatctc 

601 ggcgcggacc acgaccagct cggacgcacc tacttccccg gcgcggaccc 

651 ctcccacctg gacgagaccc gcaagcggct gctgctcgcg gacctcggcg 

701 cggacctgga cgcggccgtg cccgggatcc tcgcgctgga ccgccgtgcc 

751 gggcgcgcgg tgctgatcgc gcacggactg ttcggtgagc tcgcacggcg 

801 gatcgaggag gtgcccgcgg cggagctcac acgacggcgc atcagcgtgc 

851 ccgccggggt gaagctgcgg atcgccgcga gagcgctgtc cgtcaccgcg 

901 cgcacgggct cacacgggcg gggccgagcc ctagagtcgg ggcccccggt 

951 gccggcggcc gtgcccgaaa cctcccggac gggggccacc cgatga 



SEQ NO: 23 is the nucleic acid sequence for the oil gene isolated from M. 
luteus. 



1 atgacgcgca cggtggtgat 

51 gggcctgctc gcccgggacg 

101 acacggtggg cggccgctcc 

151 gacaccggac ccagctggta 

201 caccctgatg ggcacgagcg 

251 acccgggcta ccgcgtcttc 

301 gacgtggtca ccggtcgtgc 

351 atcctcccgc gcactgcgct 

401 agctcgccaa gaagcacttc 

451 cttgtgcgcc cggaggtgct 

501 gggcacgtcc atgaagaact 

551 agcgccagat cctgggctac 

601 tccgccccgg ccatgtacca 

651 agtgcagtac ccggtgggcg 

701 ggctcgtgcg cgaggccggc 

751 ggcatcgagg tggctcccga 

801 agcccgggca cgacgtcgca 

851 gcacggcgcc gggggcggac 

901 ggtgcggagg tcaccgtgcc 

951 gcaccacctc cagacccgcc 

1001 cccgctggaa gcgccgcgac 

1051 ggcgtgcgcg ggaagctgcc 

1101 cgcggactgg gatgagaact 

1151 ccgaggagac ctcgatctac 

1201 accgcgcccg agggggacga 

1251 ggcacccgag tggggtcacg 

1301 ccggctccgc gcaggtggag 



cggcggcggc ttcgcgggcc tggccacggc 
ggcacagcgt caccctgctc gagcagcagg 
gggcggtggt ccgcggaggg cttctcgttc 
cctcatgccc gaggtgatcg accgctggtt 
ccgccgagca gctggacctg cgccggctgg 
ttcgaggacc acctggcgga accgcccacg 
cgaggagctg ttcgagagcc tcgacccggg 
cctacctgga ctcgggcgcg caggtctacg 
ctctacacgg acttcgccca cctgctggac 
ccgcaacctc ccgcggttgg caacgctgct 
acgttgcgcg ccgttttccg gagccgcggc 
cccgccgtct tcctgggggc gtccccctcg 
cctcatgagc cacctggacc tcaccgacgg 
ggttcgccgc gctggtggac gccatggaac 
gtggagatcg tcacgggagc caccgtgacc 
gccgcggtcg ccgcgttccc ggttggccgc 
ccgccggcac ggtcacgggc gtcaccttcc 
ccggggacgg agccgggcgg cgtcgtcgcc 
cgcggacgtc gtcgtcggcg ccgcggacct 
tgcttcccgg cccgttccgc gcaccggagt 
cccgggccct ccggggtgct cgtgtgcctg 
gcagctggcc caccacaacc tgctgttcac 
tcgggcgcat cgagtccggt gcggacctgg 
gtgtccatga cgtcggcgac ggatcccggc 
gaacctgttc atcctggtgc cctcgcccgc 
gcggaaccac cgccccgggc gtcgacgagc 
cgggtcgctg acgccgccat cgcgcagctc 
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1351 gcgcgctggg cgcagatccc ggacctggcc tcgcggatcg tggtgcgcag 

1401 gacctacggg cccgaggact tcgcggtggg ggtcaacgcg tggcgcggct 

1451 ccctgctggg ccccggacac attctgacgc agtccgcgat gttccgtccc 

1501 agcgtcaccg accgtgggat ccgggggctg ttctacgccg ggtcctcggt 

1551 gcgcccgggg atcggcgtgc ccatgtgcct gatctcctcc gaggtggtgc 

1601 gggacgccgt gcgggagagc ggggcgcgct ga 



SEQ NO: 24 is the amino acid sequence encoded by SEQ NO: 21. 

1 mtsetdtaad ptavwdvfra avdreldeff dsprnrvpys pgfpvmwdri 

51 rggvvggkli rprltqiawr sfagesstds greaecvrla asfemlhaal 

101 ivhddvvdrd wrrrgrptvg elfrrdavqa gapegeaeha gesaailagd 

151 lllagalrla ttctedpgrg ravadvvfea vtasaageld dlllslhryg 

201 aehpgvgdil dmerlktaty sfeaplraga llagapeeqa qrlaragaql 

251 gvayqvvddv lgtfgdpelt gksvdadlns gkatvltahg mqtpavrdvl 

301 aelaagrttv asaraaltas gageaavava tdlvdrarat ldglplpaaq 

351 raeldalchh vlnrds 

SEQ NO: 25 is the amino acid sequence encoded by SEQ NO: 22. 

1 vrtptmpqda padaplslyt atalaasgav igxystsfsl acrtlpaavr 

51 rdiagiyalv rvadevvdgt agaaglgadr vraaldayea evasalatgf 

101 stdlvvhgfa gvarrhgfgt eltepffasm radldvaehd gaslesyiyg 

151 saevvglmcl evfmdmpgtr aqtpeqreml ratarrlgaa fqkvnflrdl 

201 gadhdqlgrt yfpgadpshl detrkrllla dlgadldaav pgilaldrra 

251 gravliahgl fgelarriee vpaaeltrrr isvpagvklr iaaralsvta 

301 rtgshgrgra lesgppvpaa vpetsrtgat r 

SEQ NO; 26 is the amino acid sequence encoded by SEQ NO: 23. 

1 mtrtvviggg faglatagll ardghsvtll eqqdtvggrs grwsaegfsf 

51 dtgpswylmp evidrwftlm gtsaaeqldl rrldpgyrvf fedhlaeppt 

101 dvvtgraeel fesldpgssr alrgyldsga qvyelakkhf lytdfahlld 

151 lvrpevlrnl prlatllgts mknyvarrfp eprqrqilgy pavflgasps 

201 sapamyhlms hldltdgvqy pvggfaalvd amerlvreag veivtgatvt 

251 gievapeprs prsrlaaara rrrtagtvtg vtfrtapgad pgtepggvva 

301 gaevtvpadv vvgaadlhhl qtrllpgpfr apesrwkrrd pgpsgvlvcl 

351 gvrgklpqla hhnllftadw denfgriesg adlaeetsiy vsiatsatdpg 

401 tapegdenlf ilvpspaape wghggttapg vdepgsaqve rvadaaiaql 

451 arwaqipdla srivvrrtyg pedfavgvna wrgsllgpgh iltqsamfrp 

501 svtdrgirgl fyagssvrpg igvpmcliss evvrdavres gar 

SEQ ID NOS: 27-30 are primers used to amplify regions of the Y1 operon. 

AIDINDEF 5'- TTCATATGTCACTAGCCAGGCGAGATATCC-3 ' 

APDHIIIR 5'- GAAAGCTTAAGAAGATGCCGAGCGAGATG-3 ' 

AXHIIIR 5'- AG AAGCTTTGTACGGC ACGAGGAAGAAC AG -3 ' 

AYHniR 5'- G AAAGCTT CTCCGTGACGAGATCCTGAG-3 ' 



SEQ ID NOS: 31 and 32 are primers used to amplify ORFY. 
AYPACF 5 ' -GTC TTAATTAA CTGCTGCTCTGCTCC ACGGTCT -3' 
AYXBAR 5'- T ATCTAGA CGCTCCGTGACGAGATCCTGAG -3' 
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SEQ ID NOS: 33 is a primer used to amplify out the region oiAgromyces 
mediolanus genomic DNA containing the X1, X2, and Y ORFs. 
AXSPHF 5 '-TAGGCATGCAACGTCGAGGGGCTGTACTTC -3 ' 

SEQ ID NOS: 34 and 35 are primers used to amplify a mutated ORFX1 , 
fragment. 

X1A 5 ' -GCTCGTCGACGCGCGCTAGCCGGCTGTTCTTCTGG -3' 
X1B 5 ' -CCAGAAGAACAGCCGGCTAGCGCGCGTCGACGAGC -3' 

SEQ ID NOS: 36 and 37 are primers used to amplify a mutated 
ORFX2 fragment. 

X2A 5'-GGAACGGGAGGCAGAGCAGGCTAGCTCATCGGCGGGCCCTTCG -3' 
X2B 5'-GGGCCCGCCGATGAGCTAGCCTGCTCTGCCTCCCGTTCC -3' 

SEQ ID NOS: 38 and 39 primers used to amplify a mutated ORFY fragment. 
YA 5'-GTGTTGATCCAGCTAGCGGGCGCGATGCGGTGAAG -3' 
YB 5'-TTCACCGCATCGCGCCCGCTAGCTGGATCAACACC -3' 



SEQ ID NOS: 40 and 41 are primers used to make a probe to identify A. 
luteus homologs. 

ORFYF: 5'- AGAGGAGCCGAGCGATGAG -3' 
ORFYR: 5'- CGTACCAGATCAGCAGCATC -3' 



SEQ ID NOS: 42 and 45 are primers used for M. luteus genomic walking. 

GSP 1 F: 5 ' -TTCATGGACGTGCCC AGC AGCGTTGCCA-3 ' 
GSP2F: 5 ' -AGGTGGGCGAAGTCCGTGTAGAGGAAG-3 ' 
GSP 1 F2 : 5 '-AAGTAGGTGCGTCCGAGCTGGTCGTGGT-3 ' 
GSP2F2 : 5 ' -GTCCGCGCCGAGATCCCGC AGGAAGTT-3 ' 
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